Harbodale1818

As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well.

We present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering and structure classification across proteins from different superfamilies as well as within the same family.

Python code available at https//git.wur.nl/durai001/geometricus.

Python code available at https//git.wur.nl/durai001/geometricus.

Advances in automation and imaging have made it possible to capture a large image dataset that spans multiple experimental batches of data. However, accurate biological comparison across the batches is challenged by batch-to-batch variation (i.e. batch effect) due to uncontrollable experimental noise (e.g. SCH900353 research buy varying stain intensity or cell density). Previous approaches to minimize the batch effect have commonly focused on normalizing the low-dimensional image measurements such as an embedding generated by a neural network. However, normalization of the embedding could suffer from over-correction and alter true biological features (e.g. cell size) due to our limited ability to interpret the effect of the normalization on the embedding space. Although techniques like flat-field correction can be applied to normalize the image values directly, they are limited transformations that handle only simple artifacts due to batch effect.

We present a neural network-based batch equalization method that can transfer images from one batch to another while preserving the biological phenotype. The equalization method is trained as a generative adversarial network (GAN), using the StarGAN architecture that has shown considerable ability in style transfer. After incorporating new objectives that disentangle batch effect from biological features, we show that the equalized images have less batch information and preserve the biological information. link2 We also demonstrate that the same model training parameters can generalize to two dramatically different types of cells, indicating this approach could be broadly applicable.

https//github.com/tensorflow/gan/tree/master/tensorflow_gan/examples/stargan.

Supplementary data are available at Bioinformatics online.

Supplementary data are available at Bioinformatics online.

Identifying cancer driver genes is a key task in cancer informatics. Most existing methods are focused on individual cancer drivers which regulate biological processes leading to cancer. However, the effect of a single gene may not be sufficient to drive cancer progression. Here, we hypothesize that there are driver gene groups that work in concert to regulate cancer, and we develop a novel computational method to detect those driver gene groups.

We develop a novel method named DriverGroup to detect driver gene groups by using gene expression and gene interaction data. The proposed method has three stages (i) constructing the gene network, (ii) discovering critical nodes of the constructed network and (iii) identifying driver gene groups based on the discovered critical nodes. Before evaluating the performance of DriverGroup in detecting cancer driver groups, we firstly assess its performance in detecting the influence of gene groups, a key step of DriverGroup. The application of DriverGroup to DREAM4 data demonstrates that it is more effective than other methods in detecting the regulation of gene groups. We then apply DriverGroup to the BRCA dataset to identify driver groups for breast cancer. The identified driver groups are promising as several group members are confirmed to be related to cancer in literature. We further use the predicted driver groups in survival analysis and the results show that the survival curves of patient subpopulations classified using the predicted driver groups are significantly differentiated, indicating the usefulness of DriverGroup.

DriverGroup is available at https//github.com/pvvhoang/DriverGroup.

Supplementary data are available at Bioinformatics online.

Supplementary data are available at Bioinformatics online.

Temporal biomarker discovery in longitudinal data is based on detecting reoccurring trajectories, the so-called shapelets. The search for shapelets requires considering all subsequences in the data. While the accompanying issue of multiple testing has been mitigated in previous work, the redundancy and overlap of the detected shapelets results in an a priori unbounded number of highly similar and structurally meaningless shapelets. link3 As a consequence, current temporal biomarker discovery methods are impractical and underpowered.

We find that the pre- or post-processing of shapelets does not sufficiently increase the power and practical utility. Consequently, we present a novel method for temporal biomarker discovery Statistically Significant Submodular Subset Shapelet Mining (S5M) that retrieves short subsequences that are (i) occurring in the data, (ii) are statistically significantly associated with the phenotype and (iii) are of manageable quantity while maximizing structural diversity. Structural diversity is achieved by pruning non-representative shapelets via submodular optimization. This increases the statistical power and utility of S5M compared to state-of-the-art approaches on simulated and real-world datasets. For patients admitted to the intensive care unit (ICU) showing signs of severe organ failure, we find temporal patterns in the sequential organ failure assessment score that are associated with in-ICU mortality.

S5M is an option in the python package of S3M github.com/BorgwardtLab/S3M.

S5M is an option in the python package of S3M github.com/BorgwardtLab/S3M.Using gene-regulatory-networks-based approach for single-cell expression profiles can reveal unprecedented details about the effects of external and internal factors. However, noise and batch effect in sparse single-cell expression profiles can hamper correct estimation of dependencies among genes and regulatory changes. Here, we devise a conceptually different method using graphwavelet filters for improving gene network (GWNet)-based analysis of the transcriptome. Our approach improved the performance of several gene network-inference methods. Most Importantly, GWNet improved consistency in the prediction of gene regulatory network using single-cell transcriptome even in the presence of batch effect. The consistency of predicted gene network enabled reliable estimates of changes in the influence of genes not highlighted by differential-expression analysis. Applying GWNet on the single-cell transcriptome profile of lung cells, revealed biologically relevant changes in the influence of pathways and master regulators due to ageing. Surprisingly, the regulatory influence of ageing on pneumocytes type II cells showed noticeable similarity with patterns due to the effect of novel coronavirus infection in human lung.

Since 2003-4, the United States has seen large declines in sugar-sweetened beverage (SSB) intake overall, especially among non-Hispanic white (NHW) subpopulations. However, obesity prevalence has not shown comparable declines in the 2 highest SSB-consuming groups, adolescents and young adults. Little is understood about the quality of the diet excluding SSBs (non-SSB diet).

The objective of this study was to evaluate differences in non-SSB diet quality in SSB consumers and nonconsumers in adolescents and young adults and in the 3 major race/ethnic subgroups.

This study utilized data from the NHANES, a cross-sectional, nationally representative survey of the US population. Data from 6426 participants aged 12-29 y from the NHANES (2009-2014) was included. Quality of the non-SSB diet was measured using the 2015 Healthy Eating Index (HEI). Multivariate linear regressions controlled for sociodemographic characteristics and included interactions by race/ethnicity [NHWs, non-Hispanic blacks (NHBs), Hispanics].consumption alone will not be a sufficient strategy for improving dietary quality in adolescents and young adults. Future policies must also consider improving the non-SSB diet.

Although adherence to healthful dietary patterns has been associated with a lower risk of kidney function decline in Western populations, evidence in Asian populations remains scanty.

We examined predefined dietary patterns, namely, the Alternate Healthy Eating Index-2010 (AHEI-2010), the Dietary Approaches to Stop Hypertension (DASH), and the alternate Mediterranean diet (aMED), in relation to risk of end-stage kidney disease (ESKD).

We included 56,985 Chinese adults (aged 45-74 y) in the Singapore Chinese Health Study who were free of cancer, stroke, coronary artery disease, and ESKD at recruitment (1993-1998). Dietary pattern scores were calculated based on a validated 165-item FFQ. AHEI-2010 and aMED scores were modified by excluding the alcohol intake component because daily drinking has been associated with a higher risk of ESKD in our study population. We identified 1026 ESKD cases over a median follow-up of 17.5 y via linkage with the nationwide Singapore Renal Registry. Multivariable Cox regresy in overweight or obese individuals.Seven entomopathogenic fungi strains (M1-7) were isolated from field-obtained dead coconut hispine beetles Brontispa longissima (Gestro), identified to species, and bioassayed for their pathogenicity. According to ITS sequences, all isolates belong in the genus Metarhizium, mainly M. flavoviride and M. anisopliae. Measured median lethal times (LT50) of 1×107 conidia/ml of M1-7 against fourth-instar B. longissima larvae within 15 d following exposure were, respectively 5.43, 10.64, 11.26, 10.93, 6.62, 4.73, and 5.95 d. The isolate M6 yielded the highest mortality to fourth-instar larvae, and was thus selected to be tested against other larval instars and adults of B. longissima, after Time-Dose-Mortality (TDM) models. M6 proved more pathogenic against larvae than adults. The obtained bioassays data produced a good fit to the TDM models, yielding estimated LC50 and LT50 for each of the tested developmental stages of B. longissima. Both the obtained dose (β) and time effect (ri) parameters from TDM models suggest that first-instar larvae are the most susceptible life stage of the pest insect, while adults are more resistant to M6 infection. Calculated LC50 values were, respectively, 1.23×103 and 1.15×106 conidia/ml for first-instar larvae and adults, on the 15th day following M6 inoculation. Estimated LT50 were 3.3 and 5.9 d for first-instar larvae and adults, respectively, at 1×108 conidia/ml. Taken together, these results would suggest Metarhizium M6 as an option for the biological control of B. longissima in the field.

Autoři článku: Harbodale1818 (Baun Humphries)

Práce s článkem

Osobní nástroje

Navigace

Nástroje

Harbodale1818