Refsgaardhardy5582

Infrared and visible image fusion has gained ever-increasing attention in recent years due to its great significance in a variety of vision-based applications. However, existing fusion methods suffer from some limitations in terms of the spatial resolutions of both input source images and output fused image, which prevents their practical usage to a great extent. In this paper, we propose a meta learning-based deep framework for the fusion of infrared and visible images. Unlike most existing methods, the proposed framework can accept the source images of different resolutions and generate the fused image of arbitrary resolution just with a single learned model. In the proposed framework, the features of each source image are first extracted by a convolutional network and upscaled by a meta-upscale module with an arbitrary appropriate factor according to practical requirements. Then, a dual attention mechanism-based feature fusion module is developed to combine features from different source images. Finally, a residual compensation module, which can be iteratively adopted in the proposed framework, is designed to enhance the capability of our method in detail extraction. In addition, the loss function is formulated in a multi-task learning manner via simultaneous fusion and super-resolution, aiming to improve the effect of feature learning. And, a new contrast loss inspired by a perceptual contrast enhancement approach is proposed to further improve the contrast of the fused image. Extensive experiments on widely-used fusion datasets demonstrate the effectiveness and superiority of the proposed method. The code of the proposed method is publicly available at https//github.com/yuliu316316/MetaLearning-Fusion.Smoke has semi-transparency property leading to highly complicated mixture of background and smoke. Sparse or small smoke is visually inconspicuous, and its boundary is often ambiguous. These reasons result in a very challenging task of separating smoke from a single image. To solve these problems, we propose a Classification-assisted Gated Recurrent Network (CGRNet) for smoke semantic segmentation. To discriminate smoke and smoke-like objects, we present a smoke segmentation strategy with dual classification assistance. Our classification module outputs two prediction probabilities for smoke. The first assistance is to use one probability to explicitly regulate the segmentation module for accuracy improvement by supervising a cross-entropy classification loss. The second one is to multiply the segmentation result by another probability for further refinement. This dual classification assistance greatly improves performance at image level. In the segmentation module, we design an Attention Convolutional GRU module (Att-ConvGRU) to learn the long-range context dependence of features. To perceive small or inconspicuous smoke, we design a Multi-scale Context Contrasted Local Feature structure (MCCL) and a Dense Pyramid Pooling Module (DPPM) for improving the representation ability of our network. Extensive experiments validate that our method significantly outperforms existing state-of-art algorithms on smoke datasets, and also obtain satisfactory results on challenging images with inconspicuous smoke and smoke-like objects.Recently, the residual learning strategy has been integrated into the convolutional neural network (CNN) for single image super-resolution (SISR), where the CNN is trained to estimate the residual images. Recognizing that a residual image usually consists of high-frequency details and exhibits cartoon-like characteristics, in this paper, we propose a deep shearlet residual learning network (DSRLN) to estimate the residual images based on the shearlet transform. The proposed network is trained in the shearlet transform-domain which provides an optimal sparse approximation of the cartoon-like image. Specifically, to address the large statistical variation among the shearlet coefficients, a dual-path training strategy and a data weighting technique are proposed. Extensive evaluations on general natural image datasets as well as remote sensing image datasets show that the proposed DSRLN scheme achieves close results in PSNR to the state-of-the-art deep learning methods, using much less network parameters.Deep unfolding methods design deep neural networks as learned variations of optimization algorithms through the unrolling of their iterations. These networks have been shown to achieve faster convergence and higher accuracy than the original optimization methods. In this line of research, this paper presents novel interpretable deep recurrent neural networks (RNNs), designed by the unfolding of iterative algorithms that solve the task of sequential signal reconstruction (in particular, video reconstruction). The proposed networks are designed by accounting that video frames' patches have a sparse representation and the temporal difference between consecutive representations is also sparse. Specifically, we design an interpretable deep RNN (coined reweighted-RNN) by unrolling the iterations of a proximal method that solves a reweighted version of the l1 - l1 minimization problem. Due to the underlying minimization model, our reweighted-RNN has a different thresholding function (alias, different activation function) for each hidden unit in each layer. In this way, it has higher network expressivity than existing deep unfolding RNN models. We also present the derivative l1 - l1 -RNN model, which is obtained by unfolding a proximal method for the l1 - l1 minimization problem. this website We apply the proposed interpretable RNNs to the task of video frame reconstruction from low-dimensional measurements, that is, sequential video frame reconstruction. The experimental results on various datasets demonstrate that the proposed deep RNNs outperform various RNN models.A novel light field super-resolution algorithm to improve the spatial and angular resolutions of light field images is proposed in this work. We develop spatial and angular super-resolution (SR) networks, which can faithfully interpolate images in the spatial and angular domains regardless of the angular coordinates. For each input image, we feed adjacent images into the SR networks to extract multi-view features using a trainable disparity estimator. We concatenate the multi-view features and remix them through the proposed adaptive feature remixing (AFR) module, which performs channel-wise pooling. Finally, the remixed feature is used to augment the spatial or angular resolution. Experimental results demonstrate that the proposed algorithm outperforms the state-of-the-art algorithms on various light field datasets. The source codes and pre-trained models are available at https//github.com/keunsoo-ko/ LFSR-AFR.

Autoři článku: Refsgaardhardy5582 (Holck Sherwood)

Práce s článkem

Osobní nástroje

Navigace

Nástroje

Refsgaardhardy5582