Skovgaardfields1735

Z Iurium Wiki

Verze z 27. 9. 2024, 23:14, kterou vytvořil Skovgaardfields1735 (diskuse | příspěvky) (Založena nová stránka s textem „In recent years, large scale datasets of paired images and sentences have enabled the remarkable success in automatically generating descriptions for image…“)
(rozdíl) ← Starší verze | zobrazit aktuální verzi (rozdíl) | Novější verze → (rozdíl)

In recent years, large scale datasets of paired images and sentences have enabled the remarkable success in automatically generating descriptions for images, namely image captioning. However, it is labour-intensive and time-consuming to collect a sufficient number of paired images and sentences in each domain. It may be beneficial to transfer the image captioning model trained in an existing domain with pairs of images and sentences (i.e., source domain) to a new domain with only unpaired data (i.e., target domain). In this paper, we propose a cross-modal retrieval aided approach to cross-domain image captioning that leverages a cross-modal retrieval model to generate pseudo pairs of images and sentences in the target domain to facilitate the adaptation of the captioning model. To learn the correlation between images and sentences in the target domain, we propose an iterative cross-modal retrieval process where a cross-modal retrieval model is first pre-trained using the source domain data and then applied to domains to further demonstrate the effectiveness of our method.Despite the remarkable advances in visual saliency analysis for natural scene images (NSIs), salient object detection (SOD) for optical remote sensing images (RSIs) still remains an open and challenging problem. In this paper, we propose an end-to-end Dense Attention Fluid Network (DAFNet) for SOD in optical RSIs. A Global Context-aware Attention (GCA) module is proposed to adaptively capture long-range semantic context relationships, and is further embedded in a Dense Attention Fluid (DAF) structure that enables shallow attention cues flow into deep layers to guide the generation of high-level feature attention maps. Olaparib in vitro Specifically, the GCA module is composed of two key components, where the global feature aggregation module achieves mutual reinforcement of salient feature embeddings from any two spatial locations, and the cascaded pyramid attention module tackles the scale variation issue by building up a cascaded pyramid framework to progressively refine the attention map in a coarse-to-fine manner. In addition, we construct a new and challenging optical RSI dataset for SOD that contains 2,000 images with pixel-wise saliency annotations, which is currently the largest publicly available benchmark. Extensive experiments demonstrate that our proposed DAFNet significantly outperforms the existing state-of-the-art SOD competitors. https//github.com/rmcong/DAFNet_TIP20.The demand of applying semantic segmentation model on mobile devices has been increasing rapidly. Current state-of-the-art networks have enormous amount of parameters hence unsuitable for mobile devices, while other small memory footprint models follow the spirit of classification network and ignore the inherent characteristic of semantic segmentation. To tackle this problem, we propose a novel Context Guided Network (CGNet), which is a light-weight and efficient network for semantic segmentation. We first propose the Context Guided (CG) block, which learns the joint feature of both local feature and surrounding context effectively and efficiently, and further improves the joint feature with the global context. Based on the CG block, we develop CGNet which captures contextual information in all stages of the network. CGNet is specially tailored to exploit the inherent property of semantic segmentation and increase the segmentation accuracy. Moreover, CGNet is elaborately designed to reduce the number of parameters and save memory footprint. Under an equivalent number of parameters, the proposed CGNet significantly outperforms existing light-weight segmentation networks. Extensive experiments on Cityscapes and CamVid datasets verify the effectiveness of the proposed approach. Specifically, without any post-processing and multi-scale testing, the proposed CGNet achieves 64.8% mean IoU on Cityscapes with less than 0.5 M parameters.Scale-invariance, good localization and robustness to noise and distortions are the main properties that a local feature detector should possess. Most existing local feature detectors find excessive unstable feature points that increase the number of keypoints to be matched and the computational time of the matching step. In this paper, we show that robust and accurate keypoints exist in the specific scale-space domain. To this end, we first formulate the superimposition problem into a mathematical model and then derive a closed-form solution for multiscale analysis. The model is formulated via difference-of-Gaussian (DoG) kernels in the continuous scale-space domain, and it is proved that setting the scale-space pyramid's blurring ratio and smoothness to 2 and 0.627, respectively, facilitates the detection of reliable keypoints. For the applicability of the proposed model to discrete images, we discretize it using the undecimated wavelet transform and the cubic spline function. Theoretically, the complexity of our method is less than 5% of that of the popular baseline Scale Invariant Feature Transform (SIFT). Extensive experimental results show the superiority of the proposed feature detector over the existing representative hand-crafted and learning-based techniques in accuracy and computational time. The code and supplementary materials can be found at https//github.com/mogvision/FFD.The transcranial Doppler (TCD) ultrasound is a method that uses a handheld low-frequency (2-2.5 MHz), pulsed Doppler phased array probe to measure blood velocity within the arteries located inside the brain. The problem with TCD lies in the low ultrasonic energy penetrating inside the brain through the skull, which leads to a low signal-to-noise ratio. This is due to several effects, including phase aberration, variations in the speed of sound in the skull, scattering, the acoustic impedance mismatch, and absorption of the three-layer medium constituted by soft tissues, the skull, and the brain. The goal of this article is to study the effect of transmission losses due to the acoustic impedance mismatch on the transmitted energies as a function of frequency. To do so, wave propagation was modeled from the ultrasonic transducer into the brain. This model calculates transmission coefficients inside the brain, leading to a frequency-dependent transmission coefficient for a given skin and bone thickness. This approach was validated experimentally by comparing the analytical results with measurements obtained from a bone phantom plate mimicking the skull.

Autoři článku: Skovgaardfields1735 (Crouch Hahn)