Dennisbyrne6798

In recent years, Salient Object Detection (SOD) has shown great success with the achievements of large-scale benchmarks and deep learning techniques. However, existing SOD methods mainly focus on natural images with low-resolutions, e.g., 400×400 or less. This drawback hinders them for advanced practical applications, which need high-resolution, detail-aware results. Besides, lacking of the boundary detail and semantic context of salient objects is also a key concern for accurate SOD. To address these issues, in this work we focus on the High-Resolution Salient Object Detection (HRSOD) task. Technically, we propose the first end-to-end learnable framework, named Dual ReFinement Network (DRFNet), for fully automatic HRSOD. More specifically, the proposed DRFNet consists of a shared feature extractor and two effective refinement heads. By decoupling the detail and context information, one refinement head adopts a global-aware feature pyramid. Without increasing too much computational burden, it can boost the spatial detail information, which narrows the gap between high-level semantics and low-level details. In parallel, the other refinement head adopts hybrid dilated convolutional blocks and group-wise upsamplings, which are very efficient in extracting contextual information. Based on the dual refinements, our approach can enlarge receptive fields and obtain more discriminative features from high-resolution images. Experimental results on high-resolution benchmarks (the public DUT-HRSOD and the proposed DAVIS-SOD) demonstrate that our method is not only efficient but also performs more accurate than other state-of-the-arts. Besides, our method generalizes well on typical low-resolution benchmarks.Deblurring images captured in dynamic scenes is challenging as the motion blurs are spatially varying caused by camera shakes and object movements. In this paper, we propose a spatially varying neural network to deblur dynamic scenes. The proposed model is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN). The RNN is used as a deconvolution operator on feature maps extracted from the input image by one of the CNNs. Another CNN is used to learn the spatially varying weights for the RNN. GNE317 As a result, the RNN is spatial-aware and can implicitly model the deblurring process with spatially varying kernels. To better exploit properties of the spatially varying RNN, we develop both one-dimensional and two-dimensional RNNs for deblurring. The third component, based on a CNN, reconstructs the final deblurred feature maps into a restored image. In addition, the whole network is end-to-end trainable. Quantitative and qualitative evaluations on benchmark datasets demonstrate that the proposed method performs favorably against the state-of-the-art deblurring algorithms.Human-designed stochastic optimization algorithms are popular tools for deep neural network training. Recently, there emerges a new approach of learning to optimize network parameter, which has achieved promising results. However, these black-box optimizers based on learning do not fully take advantage of the experiences in human-designed optimizers, therefore have limited generalization ability. In this paper, we propose a novel optimizer, dubbed as Variational HyperAdam, which learns to optimize network parameter based on a parametric generalized Adam algorithm, i.e., HyperAdam, in a variational framework. Different from current network optimizers, the network parameter update at each step is considered as a random variable whose approximate posterior distribution given the training data is inferred by variational inference at every trainig step. The parameter update vector is sampled from the distribution. The expectation of the approximate posterior is modeled as a combination of multiple adaptive moments associated with different adaptive weights. These adaptive moments are generated by Adam with varying exponential decay rates. Both the combination weights and exponential decay rates are adaptively learned based on the states during optimization. Experiments justify that variational HyperAdam is effective for various network training, such as multilayer perceptron, CNN, LSTM and ResNet.For egocentric vision tasks such as action recognition, there is a relative scarcity of labeled data. This increases the risk of overfitting during training. In this paper, we address this issue by introducing a multitask learning scheme that employs related tasks as well as related datasets in the training process. Related tasks are indicative of the performed action, such as the presence of objects and the position of the hands. By including related tasks as additional outputs to be optimized, action recognition performance typically increases because the network focuses on relevant aspects in the video. Still, the training data is limited to a single dataset because the set of action labels usually differs across datasets. To mitigate this issue, we extend the multitask paradigm to include datasets with different label sets. During training, we effectively mix batches with samples from multiple datasets. Our experiments on egocentric action recognition in the EPIC-Kitchens, EGTEA Gaze+, ADL and Charades-EGO datasets demonstrate the improvements of our approach over single-dataset baselines. On EGTEA we surpass the current state-of-the-art by 2.47%. We further illustrate the cross-dataset task correlations that emerge automatically with our novel training scheme.In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been developed based on the proposed upper bound of the local Rademacher complexity by the strict mathematical deduction. The analyses of dropout in FCNs and DropBlock in CNNs with keep rate matrices in different layers are also included in the complexity analyses. With the new regularization function, we establish a two-stage procedure to obtain the optimal keep rate matrix and weight matrix to realize the whole training model. Extensive experiments have been conducted to demonstrate the effectiveness of LocalDrop in different models by comparing it with several algorithms and the effects of different hyperparameters on the final performances.

Autoři článku: Dennisbyrne6798 (Mcguire Benjamin)

Práce s článkem

Osobní nástroje

Navigace

Nástroje

Dennisbyrne6798