Lauridsenwoodruff2715
Finally, based on the constructed graphs, the multi-view information is propagated through nodes methodically, which equips the learned features with multi-view reasoning ability. Danicamtiv mw Experiments on two benchmarks reveal that AGN significantly exceeds the state-of-the-art performance. Visualization results show that AGN provides interpretable visual cues for clinical diagnosis.We present the first systematic study on concealed object detection (COD), which aims to identify objects that are ?perfectly? embedded in their background. The high intrinsic similarities between the concealed objects and their background make COD far more challenging than traditional object detection/segmentation. To better understand this task, we collect a large-scale dataset, called COD10K, which consists of 10,000 images covering concealed objects in diverse real-world scenarios from 78 object categories. Further, we provide rich annotations including object categories, object boundaries, challenging attributes, object-level labels, and instance-level annotations. Our COD10K enables comprehensive concealed object understanding and can even be used to help progress several other vision tasks, such as detection, segmentation, classification etc. We also design a simple but strong baseline for COD, termed the Search Identification Network (SINet). Without any bells and whistles, SINet outperform 12 cutting-edge baselines on all datasets tested, making them robust, general architectures that could serve as catalysts for future research in COD. Finally, we provide some interesting findings, and highlight several potential applications and future directions. To spark research in this new field, our code, dataset, and online demo are available at our project page http//mmcheng.net/cod.Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relational inference in a graphical model with sparse contextual subjects (nodes) and unknown graph structure (relation descriptor); how to model the underlying context-aware relational inference is critical. To this end, we propose a novel Context-Aware Graph (CAG) neural network. We focus on the exploitation of fine-grained relational reasoning with object-level visual-historical co-reference nodes. The graph structure (relation in dialog) is iteratively updated using an adaptive top-K message passing mechanism. To eliminate sparse useless relations, each node has dynamic relations in the graph (different related K neighbor nodes), and only the most relevant nodes are attributive to the context-aware relational graph inference. In addition, to avoid negative performance caused by linguistic bias of history, we propose a pure visual-aware knowledge distillation mechanism named CAG-Distill, in which image-only visual clues are used to regularize the joint visual-historical contextual awareness. Experimental results on VisDial v0.9 and v1.0 datasets show that both CAG and CAG-Distill outperform comparative methods. Visualization results further validate the remarkable interpretability of our graph inference solution.Original k-means method using Lloyd algorithm partitions a data set by minimizing a sum of squares cost function to find local minima, which can be used for data analysis and machine learning that shows promising performance. However, Lloyd algorithm suffers from finding bad local minima. In this paper, we use coordinate descent (CD) method to solve the problem. First, we show that the k-means minimization problem can be reformulated as a trace maximization problem, a simple and very efficient coordinate descent scheme is proposed to solve this problem later. The effectiveness of our method is illustrated on several real-world data sets with varing number of clusters, varing number of samples and varing number of dimensionalty. Extensive experiments conducted show that CD performs better compared to Lloyd, i.e., lower objective value and better local minima. What's more, the results show that CD is more robust to initialization than Lloyd method whether the initialization strategy is random or k-means++. In addition, according to the computational complexity analysis, it is verified CD has the same time complexity with original k-means method.In this paper, we propose a novel deep Efficient Relational Sentence Ordering Network (referred to as ERSON) by leveraging pre-trained language model in both encoder and decoder architectures to strengthen the coherence modeling of the entire model. Specifically, we first introduce a divide-and-fuse BERT (referred to as DF-BERT), a new refactor of BERT network, where lower layers in the improved model encode each sentence in the paragraph independently, which are shared by different sentence pairs, and the higher layers learn the cross-attention between sentence pairs jointly. It enables us to capture the semantic concepts and contextual information between the sentences of the paragraph, while significantly reducing the runtime and memory consumption without sacrificing the model performance. Besides, a Relational Pointer Decoder (referred to as RPD) is developed, which utilizes the pre-trained Next Sentence Prediction (NSP) task of BERT to capture the useful relative ordering information between sentences to enhance the order predictions. In addition, a variety of knowledge distillation based losses are added as auxiliary supervision to further improve the ordering performance. The extensive evaluations on Sentence Ordering, Order Discrimination, and Multi-Document Summarization tasks show the superiority of ERSON to the state-of-the-art ordering methods.We present DistillFlow, a knowledge distillation approach to learning optical flow. DistillFlow trains multiple teacher models and a student model, where challenging transformations are applied to the input of the student model to generate hallucinated occlusions as well as less confident predictions. Then, a self-supervised learning framework is constructed confident predictions from teacher models are served as annotations to guide the student model to learn optical flow for those less confident predictions. The self-supervised learning framework enables us to effectively learn optical flow from unlabeled data, not only for non-occluded pixels, but also for occluded pixels. DistillFlow achieves state-of-the-art unsupervised learning performance on both KITTI and Sintel datasets. Our self-supervised pre-trained model also provides an excellent initialization for supervised fine-tuning, suggesting an alternate training paradigm in contrast to current supervised learning methods that highly rely on pre-training on synthetic data. At the time of writing, our fine-tuned models ranked 1st among all monocular methods on the KITTI 2015 benchmark, and outperform all published methods on the Sintel Final benchmark. More importantly, we demonstrate the generalization capability of DistillFlow in three aspects framework generalization, correspondence generalization and cross-dataset generalization.Text is a new way to guide human image manipulation. Albeit natural and flexible, text usually suffers from inaccuracy in spatial description, ambiguity in the description of appearance, and incompleteness. We in this paper address these issues. To overcome inaccuracy, we use structured information (e.g., poses) to help identify correct location to manipulate, by disentangling the control of appearance and spatial structure. Moreover, we learn the image-text shared space with derived disentanglement to improve accuracy and quality of manipulation, by separating relevant and irrelevant editing directions for the textual instructions in this space. Our model generates a series of manipulation results by moving source images in this space with different degrees of editing strength. Thus, to reduce the ambiguity in text, our model generates sequential output for manual selection. In addition, we propose an efficient pseudo label loss to enhance editing performance when the text is incomplete. We evaluate our method on various datasets and show its precision and interactiveness to manipulate human images.In this paper, we explore the mask representation in instance segmentation with Point-of-Interest (PoI) features. Differentiating multiple potential instances within a single PoI feature is challenging, because learning a high-dimensional mask feature for each instance using vanilla convolution demands a heavy computing burden. To address this challenge, we propose an instance-aware convolution. It decomposes this mask representation learning task into two tractable modules as instance-aware weights and instance-agnostic features. The former is to parametrize convolution for producing mask features corresponding to different instances, improving mask learning efficiency by avoiding employing several independent convolutions. Meanwhile, the latter serves as mask templates in a single point. Together, instance-aware mask features are computed by convolving the template with dynamic weights, used for the mask prediction. Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach, building upon dense one-stage detectors. Through extensive experiments, we evaluated the effectiveness of our framework built upon RetinaNet and FCOS. PointINS in ResNet101 backbone achieves a 38.3 mask mean average precision (mAP) on COCO dataset, outperforming existing point-based methods by a large margin. It gives a comparable performance to the region-based Mask R-CNN with faster inference. Identifying mild-to-critical COVID-19 patients is important for early prevention and personalized treatment planning.It is well-known that expanding glioblastomas typically induce significant deformations of the surrounding parenchyma (i.e., the so-called ?mass effect?). In this study, we evaluate the performance of three mathematical models of tumor growth 1) a reaction-diffusion-advection model which accounts for mass effect (RDAM), 2) a reaction-diffusion model with mass effect that is consistent only in the case of small deformations (RDM), and 3) a reaction-diffusion model that does not include the mass effect (RD). The models were calibrated with magnetic resonance imaging (MRI) data obtained during tumor development in a murine model of glioma (n = 9). We obtained T2-weighted and contrast-enhanced T1-weighted MRI at 6 time points over 10 days to determine the spatiotemporal variation in the mass effect and tumor concentration, respectively. We calibrated the three models using data 1) at the first four, 2) only at the first and fourth, and 3) only at the third and fourth time points. Each of these calibrations were run forward in time to predict the volume fraction of tumor cells at the conclusion of the experiment. The diffusion coefficient for the RDAM model (median of 10.65 ? 10-3 mm2d-1) is significantly less than those for the RD and RDM models (17.46 ? 10-3 mm2d-1 and 19.38 ? 10-3 mm2d-1, respectively). The tumor concentrations for the RD, RDM, and RDAM models have medians of 40.2%, 32.1%, and 44.7%, respectively, for the calibration using data from the first four time points. The RDM model most accurately predicts tumor growth, while the RDAM model presents the least variation in its estimates of the diffusion coefficient and proliferation rate. This study demonstrates that the mathematical models capture both tumor development and mass effect observed in experiments.