ICDSUPL1-T023

Volume: 1, 2022
1st International PhD Student’s Conference at the University of Life Sciences in Lublin, Poland: ENVIRONMENT  – PLANT  – ANIMAL  – PRODUCT

Abstract number: T023

DOI: https://doi.org/10.24326/ICDSUPL1.T023

Published online: 26 April 2022

ICDSUPL, 1, T023 (2022)


Lyse with Class – classification of endolysins through machine learning

Sophia Bałdysz1*, Jakub Barylski1

1 Department of Molecular Virology, Faculty of Biology, Adam Mickiewicz University, Wieniawskiego 1, 61-712 Poznań, Poland

* Corresponding author: sophiabaldysz@gmail.com

Abstract


The past few decades have seen a surge of bioinformatic tools based on machine learning approaches, but quite a few niches are still relatively unexplored. Our research has been focused on developing a pipeline which tackles the classification problem of viral lytic proteins, also called endolysins. These proteins break down bacterial cell walls at the end of a viral replication cycle. The disruption of the wall kills the cell; ergo lysins display genus, species or strain-specific antimicrobial properties. Hence, these proteins have been studied as alternatives to currently used antibiotics. Their specificity decreases the chances of resistance being triggered in the target bacterium and reduces the potentially caused harm to the natural human microbiota. Thus, our lytic protein classification pipeline may help to fill the niche on the antimicrobial market, which was left by the dwindling efficacy of conventional antibiotics. The search for these proteins through traditional “wet lab” methods can often be tedious and fruitless, since some viruses, which carry these enzymes, are difficult to culture in laboratory conditions. However; aiding experiments through the use of machine-learning may vastly increase both throughput and the sensitivity of such analyses. In order to find novel lysins we trained a number of basic Scikit-learn classifiers on simple representations of endolysin sequences. After the initial evaluation performed on three independent test datasets, we found that by using just straightforward derivatives of the amino acid composition, we could train estimators with f1 scores up to 0.81 corresponding to accuracy of 0.81 and sensitivity equal to 0.79. To sum up, our approach may facilitate the search for lytic enzymes that can be used as   antimicrobial substances in various branches of industry, medicine and veterinary science.

S.B’s work was funded by the project “Inicjatywa Doskonałości – Uczelnia Badawcza”, PhD minigrant, grant number 017/02/SNP/0025.

J.B.’s work was supported by the National Center for Research and Development (NCBR, Poland), grant number LIDER/5/0023/L-10/18/NCBR/2019

How to cite

S. Bałdysz, J. Barylski, 2022. Lyse with Class – classification of endolysins through machine learning. In: 1st International PhD Student’s Conference at the University of Life Sciences in Lublin, Poland: Environment – Plant – Animal – Product. https://doi.org/10.24326/ICDSUPL1/T023

Skip to content