Briggsbork4429

In addition, we propose a novel method to fit bounding ellipses of arbitrary orientation using object detection networks and apply it to an omni-directional real-world human detection dataset.Current NRSfM algorithms are limited from two perspectives (i) the number of images, and (ii) the type of shape variability they can handle. In this paper we propose a novel hierarchical sparse coding model for NRSFM which can overcome (i) and (ii) to such an extent, that NRSFM can be applied to problems in vision previously thought too ill posed. Our approach is realized in practice as the training of an unsupervised deep neural network (DNN) auto-encoder with a unique architecture that is able to disentangle pose from 3D structure. Using modern deep learning computational platforms allows us to solve NRSfM problems at an unprecedented scale and shape complexity. Our approach has no 3D supervision, relying solely on 2D point correspondences. Further, our approach is also able to handle missing/occluded 2D points without the need for matrix completion. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in some instances by an order of magnitude. We further propose a new quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstructability.The ability of camera arrays to efficiently capture higher space-bandwidth product than single cameras has led to various multiscale and hybrid systems. These systems play vital roles in computational photography, including light field imaging, 360 VR camera, gigapixel videography, etc. One of the critical tasks in multiscale hybrid imaging is matching and fusing cross-resolution images from different cameras under perspective parallax. In this paper, we investigate the reference-based super-resolution (RefSR) problem associated with dual-camera or multi-camera systems, with a significant resolution gap (8x) and large parallax (10%pixel displacement). We present CrossNet++, an end-to-end network containing novel two-stage cross-scale warping modules. The stage I learns to narrow down the parallax distinctively with the strong guidance of landmarks and intensity distribution consensus. Then the stage II operates more fine-grained alignment and aggregation in feature domain to synthesize the final super-resolved image. To further address the large parallax, new hybrid loss functions comprising warping loss, landmark loss and super-resolution loss are proposed to regularize training and enable better convergence. CrossNet++ significantly outperforms the state-of-art on light field datasets as well as real dual-camera data. We further demonstrate the generalization of our framework by transferring it to video super-resolution and video denoising.Multi-view stereopsis (MVS) tries to recover the 3D model from 2D images. As the observations become sparser, the significant 3D information loss makes the MVS problem more challenging. Instead of only focusing on densely sampled conditions, we investigate sparse-MVS with large baseline angles since sparser sampling is always more favorable inpractice. By investigating various observation sparsities, we show that the classical depth-fusion pipeline becomes powerless for thecase with larger baseline angle that worsens the photo-consistency check. As another line of solution, we present SurfaceNet+, a volumetric method to handle the 'incompleteness' and 'inaccuracy' problems induced by very sparse MVS setup. Specifically, the former problem is handled by a novel volume-wise view selection approach. It owns superiority in selecting valid views while discarding invalid occluded views by considering the geometric prior. Furthermore, the latter problem is handled via a multi-scale strategy that consequently refines the recovered geometry around the region with repeating pattern. The experiments demonstrate the tremendous performance gap between SurfaceNet+ and the state-of-the-art methods in terms of precision and recall. Under the extreme sparse-MVS settings in two datasets, where existing methods can only return very few points, SurfaceNet+ still works as well as in the dense MVS setting.What is the current state-of-the-art for image restoration and enhancement applied to degraded images acquired under less than ideal circumstances? Can the application of such algorithms as a pre-processing step improve image interpretability for manual analysis or automatic visual recognition to classify scene content? While there have been important advances in the area of computational photography to restore or enhance the visual quality of an image, the capabilities of such techniques have not always translated in a useful way to visual recognition tasks. To address this, we introduce the UG 2 dataset as a large-scale benchmark composed of video imagery captured under challenging conditions, and two enhancement tasks designed to test algorithmic impact on visual quality and automatic object recognition. Furthermore, we propose a set of metrics to evaluate the joint improvement of such tasks as well as individual algorithmic advances, including a novel psychophysics-based evaluation regime for human assessment and a realistic set of quantitative measures for object recognition performance. We introduce six new algorithms for image restoration or enhancement, which were created as part of the IARPA sponsored UG 2 Challenge workshop held at CVPR 2018.This work presents a novel method of exploring human brain-visual representations, with a view towards replicating these processes in machines. The core idea is to learn plausible computational and biological representations by correlating human neural activity and natural images. Thus, we first propose a model, EEG-ChannelNet, to learn a brain manifold for EEG classification. After verifying that visual information can be extracted from EEG data, we introduce a multimodal approach that uses deep image and EEG encoders, trained in a siamese configuration, for learning a joint manifold that maximizes a compatibility measure between visual features and brain representations. We then carry out image classification and saliency detection on the learned manifold. Performance analyses show that our approach satisfactorily decodes visual information from neural signals. This, in turn, can be used to effectively supervise the training of deep learning models, as demonstrated by the high performance of image classification and saliency detection on out-of-training classes. The obtained results show that the learned brain-visual features lead to improved performance and simultaneously bring deep models more in line with cognitive neuroscience work related to visual perception and attention.Convolutional networks have achieved great success in various vision tasks. This is mainly due to a considerable amount of research on network structure. In this study, instead of focusing on architectures, we focused on the convolution unit itself. The existing convolution unit has a fixed shape and is limited to observing restricted receptive fields. In earlier work, we proposed the active convolution unit (ACU), which can freely define its shape and learn by itself. In this paper, we provide a detailed analysis of the previously proposed unit and show that it is an efficient representation of a sparse weight convolution. buy CM272 Furthermore, we extend an ACU to a grouped ACU, which can observe multiple receptive fields in one layer. We found that the performance of a naive grouped convolution is degraded by increasing the number of groups; however, the proposed unit retains the accuracy even though the number of parameters decreases. Based on this result, we suggest a depthwise ACU, and various experiments have shown that our unit is efficient and can replace the existing convolutions.The goal of single-image deraining is to restore the rain-free background scenes of an image degraded by rain streaks and rain accumulation. The early single-image deraining methods employ a cost function, where various priors are developed to represent the properties of rain and background layers. Since 2017, single-image deraining methods step into a deep-learning era, and exploit various types of networks, i.e. convolutional neural networks, recurrent neural networks, generative adversarial networks, etc., demonstrating impressive performance. Given the current rapid development, in this paper, we provide a comprehensive survey of deraining methods over the last decade. We summarize the rain appearance models, and discuss two categories of deraining approaches model-based and data-driven approaches. For the former, we organize the literature based on their basic models and priors. For the latter, we discuss developed ideas related to architectures, constraints, loss functions, and training datasets. We present milestones of single-image deraining methods, review a broad selection of previous works in different categories, and provide insights on the historical development route from the model-based to data-driven methods. We also summarize performance comparisons quantitatively and qualitatively. Beyond discussing the technicality of deraining methods, we also discuss the future directions.One key challenge in the point cloud segmentation is the detection and split of overlapping regions between different planes. The existing methods depend on the similarity and the dissimilarity in neighbor regions without a global constraint, which brings the 'over- ' and 'under- ' segmentation in the results. Hence, this paper presents a pipeline of the accurate plane segmentation for point clouds to address the shortcoming in the local optimization. There are two phases included in the proposed segmentation process. One is a local phase to calculate connectivity scores between different planes based on local variations of surface normals. In this phase, a new optimal-vector-field is formulated to detect the plane intersections. The optimal-vector-field is large in magnitude at plane intersections and vanishing at other regions. The other one is a global phase to smooth local segmentation cues to mimic leading eigenvector computation in the graph-cut. Evaluation of two datasets shows that the achieved precision and recall is 94.50% and 90.81% on the collected mobile LiDAR data and obtains an average accuracy of 75.4% on an open benchmark, which outperforms the state-of-the-art methods in terms of completeness and correctness.There still involve lots of challenges when applying machine learning algorithms in unknown environments, especially those with limited training data. To handle the data insufficiency and make a further step towards robust learning, we adopt the learnware notion which equips a model with an essential reusable property---the model learned in a related task could be easily adapted to the current data-scarce environment without data sharing. To this end, we propose the REctiFy via heterOgeneous pRedictor Mapping (ReForm) framework enabling the current model to take advantage of a related model from two kinds of heterogeneous environment, i.e., either with different sets of features or labels. By Encoding Meta InformaTion (EMIT) of features and labels as the model specification, we utilize an optimal transported semantic mapping to characterize and bridge the environment changes. After fine-tuning over a few labeled examples through a biased regularization objective, the transformed heterogeneous model adapts to the current task efficiently.

Autoři článku: Briggsbork4429 (Marcussen Goode)

Práce s článkem

Osobní nástroje

Navigace

Nástroje

Briggsbork4429