Hamptonkaae7547

Z Iurium Wiki

At inference time, only the student network is deployed for processing low-quality skeletons. In the proposed network, a graph matching loss is proposed to distill the graph structural knowledge at an intermediate representation level. We also propose a new gradient revision strategy to seek a balance between mimicking the teacher model and directly improving the student model's accuracy. Experiments are conducted on Kenetics400, NTU RGB+D and Penn action recognition datasets and the comparison results demonstrate the effectiveness of our scheme.Unsupervised cross domain (UCD) person re-identification (re-ID) aims to apply a model trained on a labeled source domain to an unlabeled target domain. It faces huge challenges as the identities have no overlap between these two domains. At present, most UCD person re-ID methods perform "supervised learning" by assigning pseudo labels to the target domain, which leads to poor re-ID performance due to the pseudo label noise. To address this problem, a multi-loss optimization learning (MLOL) model is proposed for UCD person re-ID. In addition to using the information of clustering pseudo labels from the perspective of supervised learning, two losses are designed from the view of similarity exploration and adversarial learning to optimize the model. Specifically, in order to alleviate the erroneous guidance brought by the clustering error to the model, a ranking-average-based triplet loss learning and a neighbor-consistency-based loss learning are developed. Combining these losses to optimize the model results in a deep exploration of the intra-domain relation within the target domain. The proposed model is evaluated on three popular person re-ID datasets, Market-1501, DukeMTMC-reID, and MSMT17. Experimental results show that our model outperforms the state-of-the-art UCD re-ID methods with a clear advantage.Video super-resolution (VSR) is to restore a photo-realistic high-resolution (HR) frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames). An important step in VSR is to fuse the feature of the reference frame with the features of the supporting frames. The major issue with existing VSR methods is that the fusion is conducted in a one-stage manner, and the fused feature may deviate greatly from the visual information in the original LR reference frame. In this paper, we propose an end-to-end Multi-Stage Feature Fusion Network that fuses the temporally aligned features of the supporting frames and the spatial feature of the original reference frame at different stages of a feed-forward neural network architecture. In our network, the Temporal Alignment Branch is designed as an inter-frame temporal alignment module used to mitigate the misalignment between the supporting frames and the reference frame. Specifically, we apply the multi-scale dilated deformable convolution as the basic operation to generate temporally aligned features of the supporting frames. Afterwards, the Modulative Feature Fusion Branch, the other branch of our network accepts the temporally aligned feature map as a conditional input and modulates the feature of the reference frame at different stages of the branch backbone. This enables the feature of the reference frame to be referenced at each stage of the feature fusion process, leading to an enhanced feature from LR to HR. Experimental results on several benchmark datasets demonstrate that our proposed method can achieve state-of-the-art performance on VSR task.Despite the remarkable progress in recent years, person Re-Identification (ReID) approaches frequently fail in cases where the semantic body parts are misaligned between the detected human boxes. To mitigate such cases, we propose a novel High-Order ReID (HOReID) framework that enables semantic pose alignment by aggregating the fine-grained part details of multilevel feature maps. The HOReID adopts a high-order mapping of multilevel feature similarities in order to emphasize the differences of the similarities between aligned and misaligned part pairs in two person images. Since the similarities of misaligned part pairs are reduced, the HOReID enhances pose-robustness within the learned features. We show that our method derives from an intuitive and interpretable motivation and elegantly reduces the misalignment problem without using any prior knowledge from human pose annotations or pose estimation networks. This paper theoretically and experimentally demonstrates the effectiveness of the proposed HOReID, achieving superior performance over the state-of-the-art methods on the four large-scale person ReID datasets.With the current exponential growth of video-based social networks, video retrieval using natural language is receiving ever-increasing attention. MV1035 Most existing approaches tackle this task by extracting individual frame-level spatial features to represent the whole video, while ignoring visual pattern consistencies and intrinsic temporal relationships across different frames. Furthermore, the semantic correspondence between natural language queries and person-centric actions in videos has not been fully explored. To address these problems, we propose a novel binary representation learning framework, named Semantics-aware Spatial-temporal Binaries ( [Formula see text]Bin), which simultaneously considers spatial-temporal context and semantic relationships for cross-modal video retrieval. By exploiting the semantic relationships between two modalities, [Formula see text]Bin can efficiently and effectively generate binary codes for both videos and texts. In addition, we adopt an iterative optimization scheme to learn deep encoding functions with attribute-guided stochastic training. We evaluate our model on three video datasets and the experimental results demonstrate that [Formula see text]Bin outperforms the state-of-the-art methods in terms of various cross-modal video retrieval tasks.Among tracking techniques applied in the 3-D freehand ultrasound (US), the camera-based tracking method is relatively mature and reliable. However, constrained by manufactured marker rigid bodies, the US probe is usually limited to operate within a narrow rotational range before occlusion issues affect accurate and robust tracking performance. Thus, this study proposed a hemispherical marker rigid body to hold passive noncoplanar markers so that the markers could be identified by the camera, mitigating self-occlusion. The enlarged rotational range provides greater freedom for sonographers while performing examinations. The single-axis rotational and translational tracking performances of the system, equipped with the newly designed marker rigid body, were investigated and evaluated. Tracking with the designed marker rigid body achieved high tracking accuracy with 0.57° for the single-axis rotation and 0.01 mm for the single-axis translation for sensor distance between 1.5 and 2 m. In addition to maintaining high accuracy, the system also possessed an enhanced ability to capture over 99.

Autoři článku: Hamptonkaae7547 (Nielsen Wang)