Lloydbyers2581
TSETA not only can be applied to most sexual eukaryotes for genome-wide tetrad analysis, it also outcompetes most currently used methods for calling out single nucleotide polymorphisms between two or more intraspecies strains or isolates.Bacterial resistance to antibiotics is a global public health problem. Its association with bloodstream infections is even more severe and may easily evolve to sepsis. To improve our response to these bacteria, it is essential to gather thorough knowledge on the main pathogens along with the main mechanisms of resistance they carry. In this paper, we performed a large meta-analysis of 3872 bacterial genomes isolated from blood samples, from which we identified 71 745 antibiotic resistance genes (ARGs). Taxonomic analysis showed that Proteobacteria and Firmicutes phyla, and the species Klebsiella pneumoniae and Staphylococcus aureus were the most represented. Comparison of ARGs with the Resfams database showed that the main mechanism of antibiotic resistance is mediated by efflux pumps. Clustering analysis between resistome of blood and soil-isolated bacteria showed that there is low identity between transport and efflux proteins between bacteria from these environments. Furthermore, a correlation analysis among all features showed that K. pneumoniae and S. aureus formed two well-defined clusters related to the resistance mechanisms, proteins and antibiotics. A retrospective analysis has shown that the average number of ARGs per genome has gradually increased. The results demonstrate the importance of comprehensive studies to understand the antibiotic resistance phenomenon.As reference genome assemblies are updated there is a need to convert epigenome sequence data from older genome assemblies to newer versions, to facilitate data integration and visualization on the same coordinate system. Conversion can be done by re-alignment of the original sequence data to the new assembly or by converting the coordinates of the data between assemblies using a mapping file, an approach referred to as 'liftover'. Compared to re-alignment approaches, liftover is a more rapid and cost-effective solution. Here, we benchmark six liftover tools commonly used for conversion between genome assemblies by coordinates, including UCSC liftOver, rtracklayerliftOver, CrossMap, NCBI Remap, flo and segment_liftover to determine how they performed for whole genome bisulphite sequencing (WGBS) and ChIP-seq data. Our results show high correlation between the six tools for conversion of 43 WGBS paired samples. For the chromatin sequencing data we found from interval conversion of 366 ChIP-Seq datasets, segment_liftover generates more reliable results than USCS liftOver. However, we found some regions do not always remain the same after liftover. To further increase the accuracy of liftover and avoid misleading results, we developed a three-step guideline that removes aberrant regions to ensure more robust genome conversion between reference assemblies.Advances in single-cell RNA sequencing over the past decade has shifted the discussion of cell identity toward the transcriptional state of the cell. While the incredible resolution provided by single-cell RNA sequencing has led to great advances in unraveling tissue heterogeneity and inferring cell differentiation dynamics, it raises the question of which sources of variation are important for determining cellular identity. Here we show that confounding biological sources of variation, most notably the cell cycle, can distort the inference of differentiation trajectories. Piperaquine We show that by factorizing single cell data into distinct sources of variation, we can select a relevant set of factors that constitute the core regulators for trajectory inference, while filtering out confounding sources of variation (e.g. cell cycle) which can perturb the inferred trajectory. Script are available publicly on https//github.com/mochar/cell_variation.Characterizing genes that are critical for the survival of an organism (i.e. essential) is important to gain a deep understanding of the fundamental cellular and molecular mechanisms that sustain life. Functional genomic investigations of the vinegar fly, Drosophila melanogaster, have unravelled the functions of numerous genes of this model species, but results from phenomic experiments can sometimes be ambiguous. Moreover, the features underlying gene essentiality are poorly understood, posing challenges for computational prediction. Here, we harnessed comprehensive genomic-phenomic datasets publicly available for D. melanogaster and a machine-learning-based workflow to predict essential genes of this fly. We discovered strong predictors of such genes, paving the way for computational predictions of essentiality in less-studied arthropod pests and vectors of infectious diseases.The integration of multiple omics datasets measured on the same samples is a challenging task data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi.Plants respond to their environment by dynamically modulating gene expression. A powerful approach for understanding how these responses are regulated is to integrate information about cis-regulatory elements (CREs) into models called cis-regulatory codes. Transcriptional response to combined stress is typically not the sum of the responses to the individual stresses. However, cis-regulatory codes underlying combined stress response have not been established. Here we modeled transcriptional response to single and combined heat and drought stress in Arabidopsis thaliana. We grouped genes by their pattern of response (independent, antagonistic and synergistic) and trained machine learning models to predict their response using putative CREs (pCREs) as features (median F-measure = 0.64). We then developed a deep learning approach to integrate additional omics information (sequence conservation, chromatin accessibility and histone modification) into our models, improving performance by 6.2%. While pCREs important for predicting independent and antagonistic responses tended to resemble binding motifs of transcription factors associated with heat and/or drought stress, important synergistic pCREs resembled binding motifs of transcription factors not known to be associated with stress.