Wykorzystanie uczenia maszynowego do prognozowania separacji cząsteczek białek [Machine learning based prediction of phase separation of protein molecules]* Pratik Mullick (Institut National de Recherche en Informatique et en Automatique (INRIA), Rainbow Team, Rennes, France) Several proteins which are responsible for neuro-degenerative disorders (Alzheimer's, Parkinson's etc.) are shown to undergo a bio-physical mechanism known as liquid liquid phase separation (LLPS). The phenomena of LLPS generally refers to coexistence of liquids with different densities. The identification of such protein molecules is important from a pharmaceutical point of view. Proteins are basically polymers that are constituted of amino-acid monomers, which are organic compounds. Therefore a protein molecule could be seen as a chain or sequence of numerous amino-acids. In this research we consider a data-driven approach to answer whether a protein chain would undergo LLPS or not. We used the protein chains for which the answer was already known. Depending on the knowledge of amino-acid sequences we identified some relevant variables in the context of LLPS. We considered a total of 43636 protein sequences, among them only 121 were phase separating. We constructed a number of scoring functions to build support vector classifiers to classify a protein chain as phase separating or not. In the cross validation strategy, 75% of the data were used as the training set and the performance of the obtained results were tested on the remaining 25% of the data. In the training process, we used Simplex algorithm to maximize area under the curve (AUC) in receiver operator characteristics (ROC) space for each of the scores we defined. The optimised parameters were then used to evaluate AUC on the test set to check the accuracy. The best performing score was identified as the predicting model to answer the question whether a protein chain would undergo phase separating behavior or not. Even after using a larger data set, we have been able to achieve a prediction accuracy of about 85%. * Joint work with Antonio Trovato (Department of Physics and Astronomy "Galileo Galilei", University of Padova, Italy) Working Paper: https://www.biorxiv.org/content/10.1101/2021.12.13.472521v1