Rahbekvalenzuela3602

In Virtual Reality, having a virtual body opens a wide range of possibilities as the participant's avatar can appear to be quite different from oneself for the sake of the targeted application (e.g. for perspective-taking). In addition, the system can partially manipulate the displayed avatar movement through some distortion to make the overall experience more enjoyable and effective (e.g. training, exercising, rehabilitation). Despite its potential, an excessive distortion may become noticeable and break the feeling of being embodied into the avatar. Past researches have shown that individuals have a relatively high tolerance to movement distortions and a great variability of individual sensitivities to distortions. In this paper, we propose a method taking advantage of Reinforcement Learning (RL) to efficiently identify the magnitude of the maximum distortion that does not get noticed by an individual (further noted the detection threshold). We show through a controlled experiment with subjects that the RL method finds a more robust detection threshold compared to the adaptive staircase method, i.e. it is more able to prevent subjects from detecting the distortion when its amplitude is equal or below the threshold. Finally, the associated majority voting system makes the RL method able to handle more noise within the forced choices input than adaptive staircase. This last feature is essential for future use with physiological signals as these latter are even more susceptible to noise. It would then allow to calibrate embodiment individually to increase the effectiveness of the proposed interactions.To convey neural network architectures in publications, appropriate visualizations are of great importance. While most current deep learning papers contain such visualizations, these are usually handcrafted just before publication, which results in a lack of a common visual grammar, significant time investment, errors, and ambiguities. Current automatic network visualization tools focus on debugging the network itself and are not ideal for generating publication visualizations. Therefore, we present an approach to automate this process by translating network architectures specified in Keras into visualizations that can directly be embedded into any publication. To do so, we propose a visual grammar for convolutional neural networks (CNNs), which has been derived from an analysis of such figures extracted from all ICCV and CVPR papers published between 2013 and 2019. The proposed grammar incorporates visual encoding, network layout, layer aggregation, and legend generation. We have further realized our approach in an online system available to the community, which we have evaluated through expert feedback, and a quantitative study. It not only reduces the time needed to generate network visualizations for publications, but also enables a unified and unambiguous visualization design.In recent years, supervised person re-identification (re-ID) models have received increasing studies. However, these models trained on the source domain always suffer dramatic performance drop when tested on an unseen domain. Existing methods are primary to use pseudo labels to alleviate this problem. One of the most successful approaches predicts neighbors of each unlabeled image and then uses them to train the model. Although the predicted neighbors are credible, they always miss some hard positive samples, which may hinder the model from discovering important discriminative information of the unlabeled domain. In this paper, to complement these low recall neighbor pseudo labels, we propose a joint learning framework to learn better feature embeddings via high precision neighbor pseudo labels and high recall group pseudo labels. The group pseudo labels are generated by transitively merging neighbors of different samples into a group to achieve higher recall. However, the merging operation may cause subgroups in the group due to imperfect neighbor predictions. To utilize these group pseudo labels properly, we propose using a similarity-aggregating loss to mitigate the influence of these subgroups by pulling the input sample towards the most similar embeddings. Extensive experiments on three large-scale datasets demonstrate that our method can achieve state-of-the-art performance under the unsupervised domain adaptation re-ID setting.Classifying the sub-categories of an object from the same super-category (e.g., bird species and cars) in fine-grained visual classification (FGVC) highly relies on discriminative feature representation and accurate region localization. Existing approaches mainly focus on distilling information from high-level features. In this article, by contrast, we show that by integrating low-level information (e.g., color, edge junctions, texture patterns), performance can be improved with enhanced feature representation and accurately located discriminative regions. Our solution, named Attention Pyramid Convolutional Neural Network (AP-CNN), consists of 1) a dual pathway hierarchy structure with a top-down feature pathway and a bottom-up attention pathway, hence learning both high-level semantic and low-level detailed feature representation, and 2) an ROI-guided refinement strategy with ROI-guided dropblock and ROI-guided zoom-in operation, which refines features with discriminative local regions enhanced and background noises eliminated. The proposed AP-CNN can be trained end-to-end, without the need of any additional bounding box/part annotation. Extensive experiments on three popularly tested FGVC datasets (CUB-200-2011, Stanford Cars, and FGVC-Aircraft) demonstrate that our approach achieves state-of-the-art performance. Models and code are available at https//github.com/PRIS-CV/AP-CNN_Pytorch-master.Tracking moving objects from space-borne satellite videos is a new and challenging task. The main difficulty stems from the extremely small size of the target of interest. First, because the target usually occupies only a few pixels, it is hard to obtain discriminative appearance features. Second, the small object can easily suffer from occlusion and illumination variation, making the features of objects less distinguishable from features in surrounding regions. Current state-of-the-art tracking approaches mainly consider high-level deep features of a single frame with low spatial resolution, and hardly benefit from inter-frame motion information inherent in videos. Thus, they fail to accurately locate such small objects and handle challenging scenarios in satellite videos. In this article, we successfully design a lightweight parallel network with a high spatial resolution to locate the small objects in satellite videos. This architecture guarantees real-time and precise localization when applied to the Siamese Trackers. Moreover, a pixel-level refining model based on online moving object detection and adaptive fusion is proposed to enhance the tracking robustness in satellite videos. It models the video sequence in time to detect the moving targets in pixels and has ability to take full advantage of tracking and detecting. We conduct quantitative experiments on real satellite video datasets, and the results show the proposed HIGH-RESOLUTION SIAMESE NETWORK (HRSiam) achieves state-of-the-art tracking performance while running at over 30 FPS.Ultrasound brain stimulation is a promising modality for probing brain function and treating brain diseases. read more However, its mechanism is as yet unclear, and in vivo effects are not well-understood. Here, we present a top-down strategy for assessing ultrasound bioeffects in vivo, using Caenorhabditis elegans. Behavioral and functional changes of single worms and of large populations upon ultrasound stimulation were studied. Worms were observed to significantly increase their average speed upon ultrasound stimulation, adapting to it upon continued treatment. Worms also generated more reversal turns when ultrasound was ON, and within a minute post-stimulation, they performed significantly more reversal and omega turns than prior to ultrasound. In addition, in vivo calcium imaging showed that the neural activity in the worms' heads and tails was increased significantly by ultrasound stimulation. In all, we conclude that ultrasound can directly activate the neurons of worms in vivo, in both of their major neuronal ganglia, and modify their behavior.Producing manual, pixel-accurate, image segmentation labels is tedious and time-consuming. This is often a rate-limiting factor when large amounts of labeled images are required, such as for training deep convolutional networks for instrument-background segmentation in surgical scenes. No large datasets comparable to industry standards in the computer vision community are available for this task. To circumvent this problem, we propose to automate the creation of a realistic training dataset by exploiting techniques stemming from special effects and harnessing them to target training performance rather than visual appeal. Foreground data is captured by placing sample surgical instruments over a chroma key (a.k.a. green screen) in a controlled environment, thereby making extraction of the relevant image segment straightforward. Multiple lighting conditions and viewpoints can be captured and introduced in the simulation by moving the instruments and camera and modulating the light source. Background data is captured by collecting videos that do not contain instruments. In the absence of pre-existing instrument-free background videos, minimal labeling effort is required, just to select frames that do not contain surgical instruments from videos of surgical interventions freely available online. We compare different methods to blend instruments over tissue and propose a novel data augmentation approach that takes advantage of the plurality of options. We show that by training a vanilla U-Net on semi-synthetic data only and applying a simple post-processing, we are able to match the results of the same network trained on a publicly available manually labeled real dataset.Fluorescence molecular tomography (FMT) is a new type of medical imaging technology that can quantitatively reconstruct the three-dimensional distribution of fluorescent probes in vivo. Traditional Lp norm regularization techniques used in FMT reconstruction often face problems such as over-sparseness, over-smoothness, spatial discontinuity, and poor robustness. To address these problems, this paper proposes an adaptive parameter search elastic net (APSEN) method that is based on elastic net regularization, using weight parameters to combine the L1 and L2 norms. For the selection of elastic net weight parameters, this approach introduces the L0 norm of valid reconstruction results and the L2 norm of the residual vector, which are used to adjust the weight parameters adaptively. To verify the proposed method, a series of numerical simulation experiments were performed using digital mice with tumors as experimental subjects, and in vivo experiments of liver tumors were also conducted. The results showed that, compared with the state-of-the-art methods with different light source sizes or distances, Gaussian noise of 5%-25%, and the brute-force parameter search method, the APSEN method has better location accuracy, spatial resolution, fluorescence yield recovery ability, morphological characteristics, and robustness. Furthermore, the in vivo experiments demonstrated the applicability of APSEN for FMT.

Autoři článku: Rahbekvalenzuela3602 (Kendall McIntyre)

Práce s článkem

Osobní nástroje

Navigace

Nástroje

Rahbekvalenzuela3602