GENOME COMPARISONS AND ANALYSIS

Sam Karlin
Stanford University
Propability

I highlight three categories of sequence features that are of interest in comparative genomics and proteomics studies. First, I discuss comparisons of unusual protein sequence features among eukaryotic species. Sequence structures, such as charge clusters, hypercharge runs, amino acid runs, alternating patterns of charged residues, and periodic patterns of histidine residues, may be important in highly specific interactions relevant to protein active sites, protein-protein, and protein-DNA/RNA interactions, and substrate docking sites. Comparisons between eukaryotic genomes may reveal many contrasts illuminating differences in replication, information processing systems, repair mechanisms, DNA modification processes, and mutational biases. Second, I review methods for the computational identification of predicted highly expressed (PHX) genes across prokaryotic genomes. Codon usage offers a way to evaluate prokaryotic gene expression. Qualitatively, a gene is PHX if its codon usage is quite similar to that of the Ribosomal protein genes, or to general Translation/Transcription factors genes, or to the major Chaperone/Degradation genes, but deviates strongly from the average codon usage of the genome. The presence or absence and expression levels of genes in a pathway can indicate if and how the pathway is utilized in each organism, reflecting covariates as habitat, lifestyle, and energy sources. I will examine explicitly the expression levels of TCA cycle genes, the glycolysis pathway genes, and detoxification genes for a diverse group of prokaryotic genomes. Third, I will comment on methods for the study of evolutionary relationships pertaining to the partition of species into three domains of life


Back to Sequence Analysis Toward System Biology