hello everyone! Today, I would like to recommend an article published on ACS. Center. SCI: deep learning for prediction and optimization of fast flow peptide synthesis. The corresponding authors are Bradley L. pentelette and Rafael of MIT Gomez bombarelli, Professor pentelete, focuses on protein engineering and drug delivery, while professor bombarelli is engaged in computational assisted synthesis methodology.

Solid phase peptide synthesis (SPPs) is an important method of synthetic peptides. Compared with gene recombinant expression, SPPs synthesized peptides are not limited by sequence and amino acid types, so they are widely used. However, SPPs involves repeated reactions of multiple steps, which consumes a lot of time and energy. In recent years, the development of flow cytometry technology has given birth to the design and use of the chemical platform of automatic synthesis instrument. The author's laboratory has designed an automatic rapid flow peptide synthesizer (AFPs), which can achieve efficient and automatic SPPs. However, the problem of side reactions such as aggregation has not been solved. In order to improve the yield, the author conceives to optimize the automatic synthesis in real time by algorithm.

In order to realize this assumption, the author needs this algorithm to establish an accurate relationship between the synthesis conditions and the yield, which can be achieved by deep learning. However, effective deep learning requires a large number of high-quality and standardized data, which means that these data cannot be simply collected from published materials because they are different standards and may not be repeatable. In order to obtain highly repeatable data of uniform standard, the authors have done a lot of deprotection synthesis steps with AFPs under the same optimized parameters, and obtained 35427 independent UV Vis detection data. The precursor sequences and introduced amino acids in each synthetic reaction were coded into barcode like forms by fingerprints. These barcodes contain all the key substructures such as side chains, amide bonds and protective groups of amino acids. The bar code information and the corresponding synthesis parameters (including reaction temperature, flow rate, coupling reagent, etc.) were taken as input, and the integral, height and width of UV Vis trace protected by Fmoc in peptide synthesis were taken as output. These variables were important criteria to measure the success of the reaction. After training the deep neural network model with the collected data, the UV Vis trace predicted by the model matches the experimental data within the allowable error range.

The model was used to predict the association between aggregation and sequence. Aggregation is one of the most important side effects in SPPs, but the relationship between aggregation and peptide sequence is not clear. The aggregation characteristic of GLP-1 is that the UV Vis peak becomes flatter and wider. The author defines this feature quantitatively by the aspect ratio, and accurately judges the aggregation of GLP-1 after adding ala18 with the training model. In order to further understand the association between aggregation and sequence, the authors used the model to predict the aggregation behavior of more than 8000 proteins in PDB. It was found that aromatic and large side chain amino acids were the most likely to cause aggregation. As a verification, the author carried out a single point mutation on the selected peptide to be synthesized and used the model to judge the possibility of aggregation of all the mutants; then several peptides were synthesized by experiment, and the yield was compared with the model prediction, and the two were consistent.

Finally, the prediction results of the model were used as a guide to optimize the parameters in the process of AFPs automatic synthesis, thus obtaining a new coupling formula for all amino acids except Trp. The results showed that the coupling yield of most amino acids was improved under the optimized conditions, but there was still room for further optimization of several amino acids including Trp.

To sum up, the model was used to optimize the synthesis conditions. The author hopes that in the future, the model can realize real-time control in the synthesis process, that is, according to the characterization information of the previous step, the optimal synthesis conditions for the next step can be automatically given.