Strategy for visualization of the function-related genes from genomic information using pylogeny
Feb 9, 2026: Research_Highlights
Paper Overview
Extraction and Visualization of Gene Sets Involved in Target Gene Function Using Phylogenetic Trees and Conserved Genomic Neighborhood Analysis: Cases of [NiFe]-Hydrogenase and Succinate Dehydrogenase
Tomoyuki Kosaka (Yamaguchi University), Minenosuke Matsutani (Tokyo University of Agriculture)
Key Highlights
- Novel Strategy: Proposed a genomic-based analysis strategy to visualize gene clusters essential for the functionalization of complex enzymes.
- Three Core Insights: Leveraged (1) genomic proximity of related genes, (2) the link between protein phylogeny and function, and (3) the mandatory coexistence of functionalization genes.
- Proof of Concept: Demonstrated using [NiFe]-hydrogenase that specific functional genes are conserved according to the target gene’s molecular phylogeny.
- Automation & Versatility: Developed an automated workflow applicable to various enzymes, successfully validated with succinate dehydrogenase.
Summary
Certain enzymes require intricate processes to become functional, including complex assembly, the binding of prosthetic groups, and the insertion of cofactors. While core proteins carry out catalysis, they often depend on auxiliary proteins known as “maturation factors.”
When performing heterologous expression, researchers must co-express these functionalization genes or ensure the host provides them. Traditionally, identifying these genes relied on literature or exhaustive trial-and-error.
We developed a strategy to visualize these gene groups by integrating three biological principles:
1 Gene Clustering: Related genes often cluster in the genome.
2 Phylogenetic Correlation: Evolutionary lineage reflects function.
3 Co-occurrence: Functionalization genes must be present in the genome for the enzyme to work.
Using the well-documented [NiFe]-hydrogenase as a model, we analyzed genes in the genomic neighborhood. Our results revealed that functional gene clusters are highly conserved along the molecular phylogeny of the target gene. To ensure objectivity and scalability, we introduced an automated clustering method based on phylogenetic clades.
Furthermore, applying this method to succinate dehydrogenase yielded reproducible results, confirming its effectiveness for enzymes requiring maturation. This approach not only streamlines heterologous host design but also enhances the functional analysis of complex enzymes.
Detailed Research Content
The Analysis Strategy
We first selected the [NiFe]-hydrogenase active subunit and 12 related genes as targets. Using blastp, we identified homologs across a library of 1,969 taxonomically diverse bacterial and archaeal genomes.
The genes located in the immediate genomic neighborhood (5 genes upstream/downstream) of each homolog were collected and clustered using the Markov Cluster (MCL) algorithm. These were categorized into “functional clusters” based on gene frequency.
Simultaneously, we analyzed the molecular phylogeny of the hydrogenase subunit. By comparing the conservation of functional clusters across phylogenetic clades, we found that the neighborhood composition is closely tied to the subunit’s evolutionary lineage. This highlights that target gene phylogeny, rather than just host species taxonomy, is crucial for understanding auxiliary gene requirements.
Automation and Validation
To eliminate subjectivity, we implemented TreeCluster to automate clade classification. We found that the avg_clademode, which uses average evolutionary distance, provided the most accurate results. This automation allows for the rapid, objective analysis of numerous related genes.
The strategy was further validated with succinate dehydrogenase, a complex requiring covalent attachment of Flavin Adenine Dinucleotide (FAD) via specific chaperones. Our analysis successfully identified the conservation of these chaperones within specific phylogenetic clades, proving the method’s robustness across different enzyme systems.
Publication Information
- Journal: Microbes and Environments
- Title: Using Phylogeny and a Conserved Genomic Neighborhood Analysis to Extract and Visualize Gene Sets Involved in Target Gene Function: The Case of $[NiFe]$-hydrogenase and Succinate Dehydrogenase
- DOI: 10.1264/jsme2.ME25018
Research Support
This work was supported by JSPS KAKENHI Grant Number JP21K05343, and the Institute for Fermentation, Osaka (IFO) Grant Number LA-2023-018. Computations were performed on the NIG supercomputer at the ROIS National Institute of Genetics.
Glossary
- [NiFe]-Hydrogenase: An enzyme catalyzing the reversible reaction H_2 <–> 2H^+ + 2e^-. It has a complex active center containing Nickel and Iron.
- Maturation Factor: Auxiliary proteins that assist in protein folding or the insertion of prosthetic groups/metals into the enzyme.
- Heterologous Expression: Introducing a gene into a different host organism (e.g., E. coli) to produce the target protein.
- Succinate Dehydrogenase: An enzyme in the citric acid cycle. It requires a chaperone for the covalent binding of FAD to be functional.
- Genomic Neighborhood: The genes located physically near a target gene on a chromosome, often sharing related functions.
- Molecular Phylogeny: The study of evolutionary relationships among genes or proteins, often visualized as a “phylogenetic tree.”
- Markov Cluster (MCL): A mathematical algorithm used to group large datasets into clusters based on the strength of their connections.
- TreeCluster: Software that automatically divides phylogenetic trees into groups (clades) based on objective numerical criteria.