publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- Unified Genomic and Chemical Representations Enable Bidirectional Bio-synthetic Gene Cluster and Natural Product RetrievalGuimei Liu, Yiting Li, Gabriel Ong, and 5 more authorsbioRxiv, May 2025
Natural product discovery is increasingly driven by the ability to analyze microbial genomes for biosynthetic gene clusters (BGCs) that encode secondary metabolites. While existing approaches have successfully linked BGCs to broad classes of chemical products, they typically operate in a single modality (genomic or chemical) limiting the scope of bidirectional prediction. In this work, we propose a multimodal framework that integrates genomic and chemical information by projecting embeddings derived from pretrained language models into a common representation space. We embed genomic sequences using a BGC foundation model and represent molecules through a chemical language model, then use a metric learning model to co-embed BGCs and their associated chemical structures. This co-embedding space allows us to quantify the similarity between BGCs and compounds using similarity measures, enabling both forward and inverse retrieval tasks. Beyond retrieval, we show that the shared space can guide strain selection for targeted compounds. By identifying BGCs closest to a query compound in the embedding space, we prioritize microbial strains that encode similar clusters, thereby streamlining genome mining and retrobiosynthetic design efforts. This approach represents a generalizable, scalable strategy to bridge biological and chemical modalities in natural product discovery.Competing Interest StatementThe authors have declared no competing interest.Agency for Science, Technology and Research, https://ror.org/036wvzt09, C2333017001
- Paradigms of convergent evolution in enzymesIoannis G. Riziotis, Jenny C. Kafas, Gabriel Ong, and 3 more authorsThe FEBS Journal, May 2025
There are many occurrences of enzymes catalysing the same reaction but having significantly different structures. Leveraging the comprehensive information on enzymes stored in the Mechanism and Catalytic Site Atlas (M-CSA), we present a collection of 34 cases for which there is sufficient evidence of functional convergence without an evolutionary link. For each case, we compare enzymes which have identical Enzyme Commission numbers (i.e. catalyse the same reaction), but different identifiers in the CATH data resource (i.e. different folds). We focus on similarities between their sequences, structures, active site geometries, cofactors and catalytic mechanisms. These features are then assessed to evaluate whether all the evidence for these structurally diverse proteins supports their independent evolution to catalyse the same chemical reaction. Our approach combines published literature information with knowledge-based computational resources from, amongst others, M-CSA, PDBe and PDBsum, supported by tailor-made software to explore active site structures and assess similarities in mechanism. We find that there are multiple types of convergent functional evolution observed to date, and it is necessary to investigate sequence, structure, active site geometry and enzyme mechanisms to describe such convergence accurately.