Holstfriis0586
Moreover, we prove that ASVRG-ADMM converges linearly for SC problems. In particular, ASVRG-ADMM improves the convergence rate from O(1/T) to O(1/T2) for non-SC problems. Finally, we apply ASVRG-ADMM to various machine learning problems, and show that ASVRG-ADMM consistently converges faster than the state-of-the-art methods.Both weakly supervised single object localization and semantic segmentation techniques learn an object's location using only image-level labels. However, these techniques are limited to cover only the most discriminative part of the object and not the entire object. To address this problem, we propose an attention-based dropout layer, which utilizes the attention mechanism to locate the entire object efficiently. To achieve this, we devise two key components; 1) hiding the most discriminative part from the model to capture the entire object, and 2) highlighting the informative region to improve the classification accuracy of the model. These allow the classifier to be maintained with a reasonable accuracy while the entire object is covered. Through extensive experiments, we demonstrate that the proposed method improves the weakly supervised single object localization accuracy, thereby achieving a new state-of-the-art localization accuracy on the CUB-200-2011 and a comparable accuracy to existing state-of-the-arts on the ImageNet-1k. The proposed method is also effective in improving the weakly supervised semantic segmentation performance on the Pascal VOC and MS COCO. Furthermore, the proposed method is more efficient than existing techniques in terms of parameter and computation overheads. Additionally, the proposed method can be easily applied in various backbone networks.Graph neural networks have achieved great success in learning node representations for graph tasks such as node classification and link prediction. Graph representation learning requires graph pooling to obtain graph representations from node representations. It is challenging to develop graph pooling methods due to the variable sizes and isomorphic structures of graphs. In this work, we propose to use second-order pooling as graph pooling, which naturally solves the above challenges. In addition, compared to existing graph pooling methods, second-order pooling is able to use information from all nodes and collect second-order statistics, making it more powerful. We show that direct use of second-order pooling with graph neural networks leads to practical problems. To overcome these problems, we propose two novel global graph pooling methods based on second-order pooling; namely, bilinear mapping and attentional second-order pooling. In addition, we extend attentional second-order pooling to hierarchical graph pooling for more flexible use in GNNs. We perform thorough experiments on graph classification tasks to demonstrate the effectiveness and superiority of our proposed methods. Experimental results show that our methods improve the performance significantly and consistently.Gait, the walking pattern of individuals, is one of the important biometrics modalities. Most of the existing gait recognition methods take silhouettes or articulated body models as gait features. Manogepix solubility dmso These methods suffer from degraded recognition performance when handling confounding variables, such as clothing, carrying and view angle. To remedy this issue, we propose a novel AutoEncoder framework, GaitNet, to explicitly disentangle appearance, canonical and pose features from RGB imagery. The LSTM integrates pose features over time as dynamic gait feature while canonical features are averaged as static gait feature. Both of them are utilized as classification features. In addition, we collect a Frontal-View Gait (FVG) dataset to focus on gait recognition from frontal-view walking, which is a challenging problem since it contains minimal gait cues compared to other views. FVG also includes other important variations, e.g., walking speed, carrying, and clothing. With extensive experiments on CASIA-B, USF, and FVG datasets, our method demonstrates superior performance to the SOTA quantitatively, the ability of feature disentanglement qualitatively, and promising computational efficiency. We further compare our GaitNet with state of the art face recognition to demonstrate the advantages of gait biometrics identification under certain scenarios, e.g., long distance/ lower resolutions, cross view angles.Energy statistics was proposed by Sz\' ekely in the 80's inspired by Newton's gravitational potential in classical mechanics and it provides a model-free hypothesis test for equality of distributions. In its original form, energy statistics was formulated in Euclidean spaces. More recently, it was generalized to metric spaces of negative type. In this paper, we consider a formulation for the clustering problem using a weighted version of energy statistics in spaces of negative type. We show that this approach leads to a quadratically constrained quadratic program in the associated kernel space, establishing connections with graph partitioning problems and kernel methods in machine learning. To find local solutions of such an optimization problem, we propose kernel k-groups, which is an extension of Hartigan's method to kernel spaces. Kernel k-groups is cheaper than spectral clustering and has the same computational cost as kernel k-means (which is based on Lloyd's heuristic) but our numerical results show an improved performance, especially in higher dimensions. Moreover, we verify the efficiency of kernel k-groups in community detection in sparse stochastic block models which has fascinating applications in several areas of science.Spatio-temporal action localization consists of three levels of tasks spatial localization, action classification, and temporal segmentation. In this work, we propose a new Progressive Cross-stream Cooperation (PCSC) framework that improves all three tasks above. The basic idea is to utilize both spatial region (resp., temporal segment proposals) and features from one stream (i.e. Flow/RGB) to help another stream (i.e. RGB/Flow) to iteratively generate better bounding boxes in the spatial domain (resp., temporal segments in the temporal domain). Specifically, we first combine the latest region proposals (for spatial detection) or segment proposals (for temporal segmentation) from both streams to form a larger set of labelled training samples to help learn better action detection or segment detection models. Second, to learn better representations, we also propose a new message passing approach to pass information from one stream to another stream, which also leads to better action detection and segment detection models.