{
    "componentChunkName": "component---src-templates-article-page-js",
    "path": "/journals/biology/micropub-biology-001796",
    "result": {"data":{"article":{"manuscript":{"id":"7b28f659-47c2-41d4-b7f2-31f8b6c9f5b8","submissionTypes":["methodology"],"citations":[],"doi":"10.17912/micropub.biology.001796","dbReferenceId":"WBPaper00069433","pmcId":"","pmId":"","proteopedia":"","reviewPanel":"","species":["zebrafish","arabidopsis","o. sativa","s. cerevisiae","escherichia coli","zea mays","human","rat","drosophila"],"integrations":[],"corrections":null,"history":{"received":"2025-08-12T07:13:43.367Z","revisionReceived":"2026-02-23T16:08:19.638Z","accepted":"2026-04-10T04:19:17.507Z","published":"2026-04-10T17:32:00.610Z","indexed":"2026-04-24T17:32:00.610Z"},"versions":[{"id":"4a97190a-3bd4-453e-b1ff-c7fd2be8150f","decision":"revise","abstract":"<p>Protein–protein interactions (PPIs) govern essential cellular processes but remain challenging to characterize experimentally due to high cost and labor intensity. We present gPPIpred, a scalable computational framework leveraging graph neural networks (GNNs) and attention mechanisms to predict PPIs at residue-level resolution. Proteins are encoded as spatially informed molecular graphs integrating physicochemical features. Using curated structural datasets for training and validation, gPPIpred was fine-tuned to reliably predict positive interactions and actual interacting sites. Attention scores highlight key residues mediating interactions, offering interpretable insights to guide experimental design. gPPIpred combines high predictive performance with explainability, providing a user-friendly pipeline for large-scale PPI discovery.</p>","acknowledgements":"<p>The authors thank the BioGRID, IntAct, and PDB teams for providing invaluable datasets, and the developers of ProtBERT for enabling advanced feature extraction.</p>","authors":[{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["conceptualization","formalAnalysis","writing_reviewEditing","supervision"],"email":"matiolli@itqb.unl.pt","firstName":"Cleverson C.","lastName":"Matiolli","submittingAuthor":false,"correspondingAuthor":null,"equalContribution":null,"WBId":"","orcid":"0000-0001-8185-7628"},{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["formalAnalysis","writing_originalDraft","writing_reviewEditing","methodology"],"email":"joana.marques@itqb.unl.pt","firstName":"Joana ","lastName":"Marques","submittingAuthor":null,"correspondingAuthor":null,"equalContribution":null,"WBId":"","orcid":"0000-0002-8922-3969"},{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["conceptualization","formalAnalysis","supervision","writing_reviewEditing","fundingAcquisition"],"email":"abreu@itqb.unl.pt","firstName":"Isabel A.","lastName":"Abreu","submittingAuthor":true,"correspondingAuthor":true,"equalContribution":null,"WBId":"","orcid":"0000-0002-5566-2146"}],"awards":[],"conflictsOfInterest":null,"dataTable":null,"extendedData":[],"funding":"<p>We acknowledge the Portuguese Fundação para a Ciência e a Tecnologia (FCT) for a PhD fellowship for JM (<a href=\"https://doi.org/10.54499/2020.06917.BD\">https://doi.org/10.54499/2020.06917.BD</a>) and project <a href=\"https://doi.org/10.54499/PTDC/ASP-PLA/1920/2021\">https://doi.org/10.54499/PTDC/ASP-PLA/1920/2021</a>, which also supported CM contract. We also acknowledge funding by GREEN-it ‘Bioresources4sustainability’ (https://doi.org/10.54499/UIDB/04551/2020 ). The funding sources were not involved in analyses, interpretation of data, writing, or in the decision to submit this paper.</p>","image":{"url":"https://portal.micropublication.org/uploads/62feda4aab23eec8269cc4c3a725d4a6.png"},"imageCaption":"<p><b>gPPipred: a user friendly, protein-protein interaction predictor. </b>The gPPipred app, built in gradio, is available at: https://encr.pw/ESWd9. Here, we show an example of an output generated by gPPIpred. a) Input area. Users can input proteins by their UniProt Accession. When the app is done processing the input information, it displays an interaction heat map, the positive interactors list and the residue plot. b) The interaction heat map, showing interaction probability with each Prey. The interaction is considered if probability ≥ 0.5. c) The positive interactors list, showing the interaction probability as well as, Swiss Prot ID, Preys gene ontology terms, entry ID, and full sequence. d) Residue plot input area and display. Users can select which Prey to display on a residue plot in the drop-down menu.</p>","imageTitle":"<p><b>gPPipred: a user friendly, protein-protein interaction predictor.</b></p>","methods":"<p><b>Dataset Preparation</b></p><p>The dataset for this study was primarily obtained from the BioGRID repository, focusing on multi-validated PPIs (Oughtred et al., 2021). This selection ensures reliability by incorporating only experimentally validated interactions across varied experimental conditions (Chatr-Aryamontri et al., 2015). Positive examples represent pairs with strong experimental evidence of interaction. Negative examples were generated by randomly pairing proteins without documented interactions, ensuring subcellular localization incompatibility to minimize the likelihood of false negatives (Sun et al., 2017).</p><p>The proteins were mapped to UniProt identifiers using the PDBsws server, which facilitated alignment between sequence and structural data (Martin, 2005). This alignment enhances reproducibility by ensuring that structural features, such as residue positioning and chain orientation, correspond directly to sequence data. The PDBsws mapping process minimizes inconsistencies and provides a standardized framework for integrating experimental and computational datasets, crucial for accurate structural graph construction. Only interactions where both proteins had high-resolution structures were included. Homodimers were removed to avoid overrepresentation and potential biases in training.</p><p><b>Protein Graph Construction</b></p><p>Protein graphs were constructed using the atomic coordinates provided in PDB files. Residues were represented as nodes, with edges denoting spatial proximities (≤9.5 Å), enabling accurate depiction of molecular structures. The preprocessing included extraction of residues’ Cartesian coordinates (x, y, z) and conversion of three-letter amino acid codes into their single-letter counterparts for simplicity and standardization (Berman et al., 2000). Structural data preprocessing relied on the integration of AlphaFold2 and PDB datasets, which have been widely validated for their accuracy in structural predictions (Jumper et al., 2021; Varadi et al., 2022). The chain mapping process ensured only valid chains were included in the analysis. Missing PDB entries were retrieved using AlphaFold API integrations, providing near-complete structural data (Varadi et al., 2022). Node features combined positional embeddings with sequence-derived properties, capturing both the structural and functional nuances. Edges were constructed using Euclidean distances between residues, ensuring spatial relevance in graph representation.</p><p><b>Feature Extraction</b></p><p>Fingerprints were generated using residue-level information extracted from protein structural data (Berman et al., 2000). These fingerprints captured local structural motifs and inter-residue interactions, enabling the model to effectively identify spatial dependencies critical for predicting PPIs. The residue adjacency matrices, normalized by degree, ensured the preservation of the graph structure, emphasizing meaningful connections between residues (Kipf &amp; Welling, 2016).</p><p>A Weisfeiler-Lehman-like subgraph extraction algorithm was employed to derive graph-level fingerprints (Schweitzer et al., 2011). Key steps included mapping residue types to atomic features using an amino acid dictionary, calculating adjacency matrices based on spatial thresholds, and iteratively refining residue embeddings through graph convolution operations. These processes collectively ensured the fingerprints were robust representations of protein structures, forming the foundation for accurate training and inference.</p><p>Graph Neural Network Architecture</p><p>gPPIpred employs a multi-layer Graph Neural Network (GNN) to predict PPIs using residue-level fingerprints as input features. Residue-level fingerprints are first encoded into high-dimensional embeddings through an embedding layer, capturing local structural motifs and residue-specific interactions within the protein. These embeddings are then processed through multiple graph convolutional layers, where information propagates across residues, enabling the model to aggregate and enhance representations based on neighboring residues features.</p><p>To identify interaction-relevant residues, a mutual attention mechanism calculates attention scores between residue pairs across the two proteins. This mechanism allows the model to focus on biologically significant regions most likely involved in interactions. Finally, the concatenated embeddings of the two proteins are passed through a dense output layer, which predicts the probability of interaction between the protein pair. The output includes both a binary interaction prediction and residue-level probabilities, ensuring interpretability and alignment with biological evidence.</p><p><b>Training and Validation</b></p><p>The model is trained using cross-entropy loss, with the adjacency matrices of the residue graphs ensuring structural integrity in the learning process. Regularization techniques, such as dropout, mitigate overfitting and enhance generalization capabilities. Each protein pair was processed independently, and their graph-level embeddings were concatenated for final interaction prediction. Training was conducted using a 5-fold cross-validation strategy. The dataset was divided into training (80%) and validation (20%) subsets. Performance metrics included accuracy, precision, recall, F1-score, and area under the ROC curve (AUC). Regularization techniques, including dropout and early stopping, were applied to prevent overfitting.</p><p><b>Interaction Site Analysis</b></p><p>Interaction sites were determined by analyzing residue-level interaction probabilities generated during model inference. These probabilities identified critical residues likely involved in protein-protein interactions, forming the foundation for further visualization and interpretation. Interaction probabilities for individual residues were computed during the mutual attention step of the model, highlighting regions with significant contributions to the overall interaction.</p><p>To provide interpretable insights, heatmaps of residue interaction probabilities were generated using tools such as matplotlib and plotly, offering clear and reproducible visualizations of interaction hotspots. Additionally, residues with high probabilities were mapped onto the protein 3D structure through scatter plots, facilitating spatial localization of key sites. These visualizations were then compared against known experimental binding sites, enabling validation of predictions and refinement of interaction models for enhanced biological relevance.</p>","reagents":"<p></p>","patternDescription":"<p>Protein–protein interactions (PPIs) are central to key biological processes, including signal transduction, immune response, and metabolic regulation. Traditionally, PPIs have been studied using experimental methods such as yeast two-hybrid systems and co-immunoprecipitation followed by mass spectrometry (Fields &amp; Song, 1989; Ho et al., 2002). However, these techniques are often labor-intensive, time-consuming, expensive, and susceptible to high rates of false positives and false negatives (Ito et al., 2001; Mrowka et al., 2001). Consequently, computational approaches have emerged as essential tools for efficient and scalable PPI prediction.​</p><p>Traditional machine learning techniques, such as support vector machines (SVM) and random forests, have been extensively employed for PPI prediction (You et al., 2014, 2015). These methods typically utilize sequence-based features, including position-specific scoring matrices (PSSM), which capture evolutionary conservation, and physicochemical properties of amino acids to represent proteins (Guo et al., 2008; Shen et al., 2007). While effective, these methods depend on manual feature engineering, which restricts scalability and hampers structural context integration. Advances in deep learning, particularly convolutional and recurrent neural networks, improved performance by learning high-level features directly from protein sequences (Hashemifar et al., 2018; Sun et al., 2017). However, these models frequently underexplore the structural information crucial for capturing the spatial relationships in protein interactions.</p><p>To address these challenges, we introduce gPPIpred, a novel framework that leverages Graph Neural Networks (GNNs) with integrated attention mechanisms to simultaneously consider physicochemical properties and structural information to predict PPIs. In gPPIpred, proteins are represented as residue-level graphs, where nodes correspond to amino acid residues, and edges are established based on spatial proximity within three-dimensional protein structures. Each protein is modeled as a graph, with nodes representing residues of significant structural and functional relevance. Nodes encode residue-level physicochemical properties, such as hydrophobicity, charge, and polarity. Edges in the graph are defined using a spatial threshold (e.g., ≤9.0 Å). By structuring the graph around these substructures rather than individual residues, the model effectively captures biologically relevant interaction motifs. For each predicted interacting pair, we extracted the residue attention scores from the model and identified the top-ranking residues involved in the interaction on each protein. These tend to form clusters on the protein surface, suggestive of binding patches. In our analysis, we found that in many cases these predicted patches correspond closely to known interaction sites.</p><p>Despite its strengths, gPPIpred has limitations. For instance, it relies on the availability of protein structural data, such as high-quality 3D structures, experimental or predicted, which are required to construct accurate graphs. This means that gPPIpred cannot be used to predict interactions with proteins that are intrinsically disordered or have no reliable structural model. Another constraint lies in the construction of negative training datasets. Although the strategy for generating such datasets is widely accepted, the risk of false negatives could be further reduced by using experimentally validated negative examples — which are typically rare or unavailable due to underreporting.</p><p>Overall, by leveraging structural representations through graph-based learning and integrating state-of-the-art embedding techniques, gPPIpred not only reduces research costs associated with large-scale interaction screenings but also enhances our understanding of the structural determinants of protein interactions. The granular insights provided by residue-level predictions have important implications for studying the biological mechanisms underlying these interactions, potentially guiding experimental validation and therapeutic targeting. Additionally, the gPPIpred pipeline facilitates PPI predictions by providing a simplified interface that automates the download and processing of protein and compound structural files, generates interactive 3D plots to visualize putative interaction sites, and creates detailed reports (Figure 1). The app can be accessed here: https://encr.pw/ESWd9. The training data, testing data, and code can be accessed here: doi.org/10.57967/hf/6092.</p>","references":[{"reference":"<p>Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J. 2000. . Nature Structural Biology 7: 957-959.</p>","pubmedId":"","doi":"doi.org/10.1038/80734"},{"reference":"<p>Chatr-aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen D, et al., Tyers. 2014. The BioGRID interaction database: 2015 update. Nucleic Acids Research 43: D470-D478.</p>","pubmedId":"","doi":"10.1093/nar/gku1204"},{"reference":"<p>Fields S, Song Ok. 1989. A novel genetic system to detect protein–protein interactions. Nature 340: 245-246.</p>","pubmedId":"","doi":"10.1038/340245a0"},{"reference":"<p>Guo Y, Yu L, Wen Z, Li M. 2008. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Research 36: 3025-3030.</p>","pubmedId":"","doi":"10.1093/nar/gkn159"},{"reference":"<p>Hashemifar S, Neyshabur B, Khan AA, Xu J. 2018. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics 34: i802-i810.</p>","pubmedId":"","doi":"10.1093/bioinformatics/bty573"},{"reference":"<p>Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, et al., Tyers. 2002. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415: 180-183.</p>","pubmedId":"","doi":"10.1038/415180a"},{"reference":"<p>Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. 2001. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences 98: 4569-4574.</p>","pubmedId":"","doi":"10.1073/pnas.061034498"},{"reference":"<p>Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al., Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596: 583-589.</p>","pubmedId":"","doi":"10.1038/s41586-021-03819-2"},{"reference":"<p>Kipf TN, Welling M. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. 2016 Nov 21.</p>","pubmedId":"","doi":""},{"reference":"<p>Martin ACR. 2005. Mapping PDB chains to UniProtKB entries. Bioinformatics 21: 4297-4301.</p>","pubmedId":"","doi":"10.1093/bioinformatics/bti694"},{"reference":"<p>Mrowka R, Patzak A, Herzel H. 2001. Is There a Bias in Proteome Research?. Genome Research 11: 1971-1973.</p>","pubmedId":"","doi":"doi.org/10.1101/gr.206701"},{"reference":"<p>Oughtred R, Rust J, Chang C, Breitkreutz BJ, Stark C, Willems A, et al., Tyers. 2020. The <scp>BioGRID</scp> database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Science 30: 187-200.</p>","pubmedId":"","doi":"doi.org/10.1002/pro.3978"},{"reference":"<p>Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y, Jiang H. 2007. Predicting protein–protein interactions based only on sequences information. Proceedings of the National Academy of Sciences 104: 4337-4341.</p>","pubmedId":"","doi":"10.1073/pnas.0607879104"},{"reference":"<p>Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM. 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning. 12(9).</p>","pubmedId":"","doi":""},{"reference":"<p>Sun T, Zhou B, Lai L, Pei J. 2017. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 18: 10.1186/s12859-017-1700-2.</p>","pubmedId":"","doi":"10.1186/s12859-017-1700-2"},{"reference":"<p>Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, et al., Velankar. 2021. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research 50: D439-D444.</p>","pubmedId":"","doi":"10.1093/nar/gkab1061"},{"reference":"<p>You ZH, Chan KCC, Hu P. 2015. Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest. PLOS ONE 10: e0125811.</p>","pubmedId":"","doi":"10.1371/journal.pone.0125811"},{"reference":"<p>You ZH, Yu JZ, Zhu L, Li S, Wen ZK. 2014. A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145: 37-43.</p>","pubmedId":"","doi":"10.1016/j.neucom.2014.05.072"}],"title":"<p>gPPIpred: A User-Friendly PPI Predictor Based on Protein Molecular Graphs</p>","reviews":[{"reviewer":{"displayName":"Herlander Azevedo"},"openAcknowledgement":false,"status":{"submitted":true}}],"curatorReviews":[]},{"id":"58f39d7e-d44e-4384-bb89-531299968920","decision":"revise","abstract":"<p>Protein–protein interactions (PPIs) govern essential cellular processes but remain challenging to characterize experimentally due to high cost and labor intensity. We present gPPIpred, a scalable computational framework leveraging graph neural networks (GNNs) and attention mechanisms to predict PPIs at residue-level resolution. Proteins are encoded as spatially informed molecular graphs integrating physicochemical features. Using curated structural datasets for training and validation, gPPIpred was fine-tuned to reliably predict positive interactions and actual interacting sites. Attention scores highlight key residues mediating interactions, offering interpretable insights to guide experimental design. gPPIpred combines high predictive performance with explainability, providing a user-friendly pipeline for large-scale PPI discovery.</p>","acknowledgements":"<p>The authors thank the BioGRID, IntAct, and PDB teams for providing invaluable datasets, and the developers of ProtBERT for enabling advanced feature extraction.</p>","authors":[{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["conceptualization","formalAnalysis","writing_reviewEditing","supervision"],"email":"matiolli@itqb.unl.pt","firstName":"Cleverson C.","lastName":"Matiolli","submittingAuthor":false,"correspondingAuthor":null,"equalContribution":true,"WBId":"","orcid":"0000-0001-8185-7628"},{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["formalAnalysis","writing_originalDraft","writing_reviewEditing","methodology"],"email":"joana.marques@itqb.unl.pt","firstName":"Joana ","lastName":"Marques","submittingAuthor":null,"correspondingAuthor":null,"equalContribution":true,"WBId":"","orcid":"0000-0002-8922-3969"},{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["conceptualization","formalAnalysis","supervision","writing_reviewEditing","fundingAcquisition"],"email":"abreu@itqb.unl.pt","firstName":"Isabel A.","lastName":"Abreu","submittingAuthor":true,"correspondingAuthor":true,"equalContribution":null,"WBId":"","orcid":"0000-0002-5566-2146"}],"awards":[],"conflictsOfInterest":"<p>The authors declare that there are no conflicts of interest present.</p>","dataTable":null,"extendedData":[{"description":"gPPIpred – High-Throughput Protein-Protein Interaction Predictor. App files and Datasets","doi":null,"resourceType":"Software","name":"-gPPIpredv2_2-main (1).zip","url":"https://portal.micropublication.org/uploads/520218f87ef06b38e53c17de4374792b.zip"}],"funding":"<p>We acknowledge the Portuguese Fundação para a Ciência e a Tecnologia (FCT) for a PhD fellowship for JM (<a href=\"https://doi.org/10.54499/2020.06917.BD\">https://doi.org/10.54499/2020.06917.BD</a>) and project <a href=\"https://doi.org/10.54499/PTDC/ASP-PLA/1920/2021\">https://doi.org/10.54499/PTDC/ASP-PLA/1920/2021</a>, which also supported CM contract. We also acknowledge funding by GREEN-it ‘Bioresources4sustainability’ (https://doi.org/10.54499/UIDB/04551/2020 ). The funding sources were not involved in analyses, interpretation of data, writing, or in the decision to submit this paper.</p>","image":{"url":"https://portal.micropublication.org/uploads/8cd051d448e005e72ff0e70b88daa998.png"},"imageCaption":"<p>The gPPipred app, built in gradio, is available at: https://encr.pw/ESWd9. <b>a)</b> Model Convergence plot showing cross-entropy loss over 20 epochs. <b>b)</b> Model Performance measured in MCC over 20 epochs of training. To calculate the MCC during training a balanced test dataset (10,000 samples) was used. MCC – Matthew’s correlation coefficient. <b>c) </b>Precision and<b> </b>Sensitivity<b> </b>plot in function of interaction threshold.<b> d)</b> Receiver Operation Characteristic Curve. Final AUC-ROC: 0.9109. Optimal Threshold: 0.85. ROC - Receiver Operation Characteristic. AUC - Area Under the ROC Curve. <b>e)</b> Confusion Matrix. MCC – Matthew’s correlation coefficient. <b>f)</b> gPPipred web app. Users can input proteins by their UniProt Accession IDs. Additionally, users can define their desired threshold. <b>g)</b> Here, we show an example of an output generated by gPPIpred. Once the program is finished analysing, the status area indicates how many preys were analysed. The chart shows the probability of each interaction as well as the line for the defined threshold. The interaction is considered positive if probability ≥ than the defined threshold. <b>h)</b> Data area. Here, the users can find and download the list of the tested interactors. This list shows bait and prey IDs, the interaction probability, Binds (Yes, if probability ≥ than the defined threshold. No, probability &lt;&nbsp; than the defined threshold), Bait and Prey Hotspots (Hotspot residues are at the interface of either bait or prey proteins that provide the bulk of the binding free energy (ΔG)for the specific interaction pair), and full sequences of Bait and Preys.</p>","imageTitle":"<p><b>gPPipred: a user friendly, protein-protein interaction predictor.</b></p>","methods":"<p><b>Dataset Preparation</b></p><p>The training and validation datasets are available in the Extended Data section. Positive interaction data were curated from the Gold Standard Dataset and multi-validated experiments from BioGRID (Bernett, 2022; Chatr-Aryamontri et al., 2015; Oughtred et al., 2021; Szklarczyk et al., 2019). A common strategy for generating negative datasets is Subcellular Localization Filtering, which involves selecting proteins from different subcellular locations and labeling them as non-interacting. Although this strategy s is widely accepted, the risk of false negatives can be further reduced by using experimentally validated negative examples. Therefore, we used a negative dataset curated by Russel Lab, dataset Stelzl (2005) (Stelzl et al., 2005; Trabuco et al., 2012). For negative interactions, the shortest path between the two proteins in the underlying two-hybrid interactome is assigned a confidence score in the following format: shortestPath:2, shortestPath:3, etc., or shortestPath:NA if there is no path connecting the two proteins. We created two separate datasets, ensuring that no individual protein sequence appeared in both datasets to prevent data leakage (Table 3). Only interactions where both proteins had high-resolution structures were included. When possible, we ensured the model had examples of both positive and negative interactions for the same protein.</p><p><b>Table 3. Number of samples contained in each dataset</b>. Number of Unique Proteins IDs in each dataset. No protein was used in both datasets to prevent data leakage. The percentage of species represented in each dataset is also listed.</p><table><tbody><tr><td><p>Dataset</p></td><td><p>Total&nbsp;</p></td><td data-colwidth=\"82\"><p>Positive&nbsp;</p><p>interactions</p></td><td><p>Negative&nbsp;interactions</p></td><td><p>Unique Proteins</p></td></tr><tr><td rowspan=\"4\"><p>Training</p></td><td rowspan=\"4\"><p>159,655 interactions:</p><p>77.87 % <i>Homo sapiens</i>,</p><p>22.05 % <i>Arabidopsis thaliana</i>,</p><p>0.05 % <i>Oryza sativa</i>,</p><p>0.02 % <i>Saccharomyces cerevisiae</i></p></td><td rowspan=\"4\" data-colwidth=\"82\"><p>77,508</p><p>(48%)</p></td><td rowspan=\"4\"><p>82,147</p><p>(52%)</p></td><td rowspan=\"4\"><p>10,349 proteins:</p><p>64 % <i>Arabidopsis thaliana,</i></p><p>35 % <i>Homo sapiens</i>,</p><p>0.8 % <i>Oryza sativa</i>,</p><p>0.2 % <i>Saccharomyces cerevisiae</i></p></td></tr><tr></tr><tr></tr><tr></tr><tr><td rowspan=\"2\"><p>Validation</p></td><td rowspan=\"2\"><p>72,358 interactions:&nbsp;</p><p>50.42 % <i>Homo sapiens</i>,</p><p>49.54&nbsp; % <i>Saccharomyces cerevisiae</i></p></td><td rowspan=\"2\" data-colwidth=\"82\"><p>36,179</p><p>(50%)</p></td><td rowspan=\"2\"><p>36,179</p><p>(50%)</p></td><td rowspan=\"2\"><p>4,192 proteins:</p><p>67.6 % <i>Homo sapiens</i>,</p><p>32.4 % <i>Saccharomyces cerevisiae</i></p></td></tr><tr></tr></tbody></table><p></p><p><b>Protein Graph Construction</b></p><p>Protein graphs were constructed by representing individual residues as nodes, with edges indicating spatial proximity based on a distance threshold of less than 9.5 Å between alpha carbon atoms. This threshold provides an accurate representation of the protein's tertiary structure and local chemical environment. Preprocessing included extracting Cartesian coordinates (x, y, z) from structural files and standardizing amino acid nomenclature into single-letter codes (Berman et al., 2000). To maximize structural coverage, we integrated experimentally determined structures from the PDB with high-confidence predicted models from AlphaFold2 (Jumper et al., 2021; Varadi et al., 2022). Missing entries were retrieved via the AlphaFold API, resulting in a nearly complete structural dataset. Node features were defined by a five-dimensional vector of physicochemical properties (hydrophobicity, volume, polarizability, pI, and pKa), while edges were defined by Euclidean distances, capturing the spatial constraints essential for predicting protein-protein interactions.</p><p><b>Feature Extraction</b></p><p>Node-level embeddings were generated by assigning relative values (0 to 1) for five physicochemical properties to each amino acid (Table 1). Thus, the model recognizes that 1.0 represents the maximum expression of that specific property. Each physicochemical property provides the model with different information. Hydrophobicity indicates the tendency of an amino acid to repel water, with 1 being the most hydrophobic and 0 the least. Volume is calculated as the Van der Waals volume, where 1 corresponds to the largest and 0 to the smallest amino acid. Polarizability measures how well an amino acid can engage in Van der Waals or London dispersion forces. Mathematically, polarizability (α) is defined as α = p/E, where p is the induced dipole moment and E is the electric field. Colloquially, polarizability reflects how \"sticky\" an amino acid is. The isoelectric point (pI) is calculated as pI = (pKa1 + pKa2)/2, indicating how basic (1) or acidic (0) an amino acid is. The dissociation constant (pKa) is determined empirically and informs the model whether a residue will be protonated or deprotonated at pH 7.4. These features capture local structural motifs and inter-residue interactions, enabling the model to identify spatial dependencies critical for predicting PPIs. The residue adjacency matrices, constructed with a 9.5 Å spatial threshold, preserved topological structures and emphasized meaningful connections between residues (Kipf &amp; Welling, 2016).</p><p><b>Graph Neural Network Architecture</b></p><p>A Graph Attention Network (GATv2) architecture was employed to iteratively refine residue embeddings. Through eight layers of recursive message passing and multi-head attention, the model aggregated neighborhood information into a global graph-level representation (Brody et al., 2021). This process ensured that the final representations were robust descriptors of protein geometry and chemistry, forming the foundation for accurate interaction prediction.</p><p><b>Training and Validation</b></p><p>The gPPIpred model was implemented as a Siamese Neural Network, a dual-stream architecture designed to learn relationships between pairs of entities. The gPPIpred GATv2 was trained for 20 epochs in batches of 128 shuffled graphs (Figure 1a and b). Here, we use a Siamese neural network that employs error back-propagation during training; the networks operate in parallel and compare their outputs at the end, usually using cosine distance. The training and validation scripts can be found in the Extended Data section.</p><p><b>Interaction Site Analysis</b></p><p>To move beyond \"black-box\" predictions, we used Saliency Mapping to identify interaction \"hotspots.\" Saliency mapping calculates the gradient of the output probability with respect to the input node features. By visualizing these gradients, we can identify specific residues that contribute most to the predicted binding event. These residues typically correspond to interface regions that provide most of the binding free energy (ΔG), offering actionable targets for site-directed mutagenesis or therapeutic intervention.</p>","reagents":"<p></p>","patternDescription":"<p>Protein–protein interactions (PPIs) are fundamental to essential biological processes, including signal transduction, immune response, and metabolic regulation. Traditionally, PPIs have been characterized using low-throughput experimental methods such as Fluorescence Resonance Energy Transfer (FRET), Bimolecular Fluorescence Complementation (BiFC), and yeast two-hybrid (Y2H) systems. High-throughput approaches often use mass spectrometry-coupled techniques, including co-immunoprecipitation (Co-IP) and tandem-affinity purification (TAP) (Bajar et al., 2016; Fields &amp; Song, 1989; Gavin et al., 2002; Ho et al., 2002; Low et al., 2021; Miller et al., 2015). However, these experimental techniques are often labor-intensive, time-consuming, costly, and prone to high rates of false positives and false negatives (Ito et al., 2001; Low et al., 2021; Mrowka et al., 2001). Recently, AlphaFold has provided a crucial advance in protein structural information by greatly increasing the number of high-accuracy protein structure models (Jumper et al., 2021; Varadi et al., 2022). As a result, computational approaches have emerged as efficient, scalable alternatives for PPI prediction.</p><p>PPI prediction has relied on scalable computing techniques like machine learning (ML) and deep learning (DL) frameworks. Traditional ML techniques, such as support vector machines (SVM) and Random Forests, have been extensively used for PPI prediction (You et al., 2014, 2015). These methods typically use sequence-based features, including position-specific scoring matrices (PSSM), which capture evolutionary conservation, and physicochemical properties of amino acids to represent proteins (Guo et al., 2008; Shen et al., 2007). While effective, these methods depend on manual feature engineering, which limits scalability and hinders the integration of structural context. Advances in deep learning, particularly convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have improved performance by learning high-level features directly from protein sequences (Hashemifar et al., 2018; Sun et al., 2017). However, these models often underutilize the structural information crucial for capturing the spatial relationships in protein interactions.</p><p>Graph Neural Networks (GNNs) are a type of deep learning method that can infer information from graphically represented data. GNNs are now being used to integrate protein structural information with convolutional and recurrent networks to increase prediction robustness (Réau et al., 2021). Both Zhou et al. (2022) and Lee (2023) have summarized and compared methods that use GNN-based strategies (Lee, 2023; Zhou et al., 2022). AlphaFold represents a landmark in the application of GNNs to biological problems (Jumper et al., 2021; Varadi et al., 2022). While AlphaFold was initially used to predict the 3D structure of proteins, it can now be applied to predict complexes between proteins. Its latest update, AlphaFold3, can test interactions between a range of molecules, including DNA, RNA, ligands, ions, and proteins (Abramson et al., 2024). Notably, the model can now account for post-translational modifications and chemical modifications of nucleic acids. All of this is presented in a user-friendly environment, the AlphaFold server, where users only need to input their molecules sequences. However, it does not support high-throughput analysis, as users are limited to 30 jobs per day, and each job cannot exceed 5,000 tokens (1 amino acid =1 token).</p><p>Here, we introduce gPPIpred, a novel framework that leverages Graph Neural Networks (GNNs) with integrated attention mechanisms to simultaneously consider physicochemical properties and structural information to predict PPIs. In gPPIpred, proteins are represented as residue-level graphs, where nodes correspond to amino acid residues and edges are established based on spatial proximity within three-dimensional protein structures. Each protein is modeled as a graph, with nodes representing residues of significant structural and functional relevance. Nodes encode the following residue-level physicochemical properties: hydrophobicity, volume, polarizability, pI, and pKa (see Feature Extraction section). The specific values for each amino acid are listed in Table 1 (Kyte &amp; Doolittle, 1982). Edges in the graph are defined using a spatial threshold. By structuring the graph around both residue properties and spatial threshold rather than individual residues, the model effectively captures biologically relevant interaction motifs. For each predicted interacting pair, interaction hotspots are calculated via saliency mapping (see Methods). Here, the saliency mapping is based on the final interaction probability, providing information on which residues in both prey and bait are critical for that specific interaction.</p><p><b>Table 1.</b> List of each amino acid hydrophobicity, Volume, Polarizability, pI and pKa values.</p><table><tbody><tr><td><p>Code</p></td><td><p>Amino Acid</p></td><td><p>Hydrophobicity</p></td><td><p>Volume</p></td><td><p>Polarizability</p></td><td><p>pI</p></td><td><p>pKa</p></td></tr><tr><td><p>A</p></td><td><p>Alanine</p></td><td><p>0.70</p></td><td><p>0.17</p></td><td><p>0.11</p></td><td><p>0.40</p></td><td><p>0.00</p></td></tr><tr><td><p>R</p></td><td><p>Arginine</p></td><td><p>0.00</p></td><td><p>0.67</p></td><td><p>0.71</p></td><td><p>1.00</p></td><td><p>1.00</p></td></tr><tr><td><p>N</p></td><td><p>Asparagine</p></td><td><p>0.11</p></td><td><p>0.32</p></td><td><p>0.33</p></td><td><p>0.33</p></td><td><p>0.00</p></td></tr><tr><td><p>D</p></td><td><p>Aspartic Acid</p></td><td><p>0.11</p></td><td><p>0.30</p></td><td><p>0.26</p></td><td><p>0.00</p></td><td><p>0.31</p></td></tr><tr><td><p>C</p></td><td><p>Cysteine</p></td><td><p>0.78</p></td><td><p>0.29</p></td><td><p>0.31</p></td><td><p>0.29</p></td><td><p>0.67</p></td></tr><tr><td><p>Q</p></td><td><p>Glutamine</p></td><td><p>0.11</p></td><td><p>0.50</p></td><td><p>0.44</p></td><td><p>0.36</p></td><td><p>0.00</p></td></tr><tr><td><p>E</p></td><td><p>Glutamic Acid</p></td><td><p>0.11</p></td><td><p>0.47</p></td><td><p>0.37</p></td><td><p>0.06</p></td><td><p>0.34</p></td></tr><tr><td><p>G</p></td><td><p>Glycine</p></td><td><p>0.46</p></td><td><p>0.00</p></td><td><p>0.00</p></td><td><p>0.40</p></td><td><p>0.00</p></td></tr><tr><td><p>H</p></td><td><p>Histidine</p></td><td><p>0.14</p></td><td><p>0.55</p></td><td><p>0.56</p></td><td><p>0.60</p></td><td><p>0.48</p></td></tr><tr><td><p>I</p></td><td><p>Isoleucine</p></td><td><p>1.00</p></td><td><p>0.64</p></td><td><p>0.45</p></td><td><p>0.41</p></td><td><p>0.00</p></td></tr><tr><td><p>L</p></td><td><p>Leucine</p></td><td><p>0.92</p></td><td><p>0.64</p></td><td><p>0.45</p></td><td><p>0.40</p></td><td><p>0.00</p></td></tr><tr><td><p>K</p></td><td><p>Lysine</p></td><td><p>0.07</p></td><td><p>0.65</p></td><td><p>0.54</p></td><td><p>0.87</p></td><td><p>0.84</p></td></tr><tr><td><p>M</p></td><td><p>Methionine</p></td><td><p>0.66</p></td><td><p>0.61</p></td><td><p>0.54</p></td><td><p>0.37</p></td><td><p>0.00</p></td></tr><tr><td><p>F</p></td><td><p>Phenylalanine</p></td><td><p>0.81</p></td><td><p>0.77</p></td><td><p>0.72</p></td><td><p>0.34</p></td><td><p>0.00</p></td></tr><tr><td><p>P</p></td><td><p>Proline</p></td><td><p>0.32</p></td><td><p>0.31</p></td><td><p>0.32</p></td><td><p>0.44</p></td><td><p>0.00</p></td></tr><tr><td><p>S</p></td><td><p>Serine</p></td><td><p>0.41</p></td><td><p>0.17</p></td><td><p>0.15</p></td><td><p>0.36</p></td><td><p>0.00</p></td></tr><tr><td><p>T</p></td><td><p>Threonine</p></td><td><p>0.42</p></td><td><p>0.33</p></td><td><p>0.26</p></td><td><p>0.35</p></td><td><p>0.00</p></td></tr><tr><td><p>W</p></td><td><p>Tryptophan</p></td><td><p>0.40</p></td><td><p>1.00</p></td><td><p>1.00</p></td><td><p>0.39</p></td><td><p>0.00</p></td></tr><tr><td><p>Y</p></td><td><p>Tyrosine</p></td><td><p>0.36</p></td><td><p>0.79</p></td><td><p>0.73</p></td><td><p>0.36</p></td><td><p>0.81</p></td></tr><tr><td><p>V</p></td><td><p>Valine</p></td><td><p>0.97</p></td><td><p>0.48</p></td><td><p>0.34</p></td><td><p>0.40</p></td><td><p>0.00</p></td></tr><tr><td><p>X</p></td><td><p>Unknown/Gap</p></td><td><p>0.00</p></td><td><p>0.00</p></td><td><p>0.00</p></td><td><p>0.00</p></td><td><p>0.00</p></td></tr></tbody></table><p></p><p>To iteratively refine residue embeddings, we employed a Graph Attention Network (GATv2) architecture. gPPIpred is a GATv2 model with eight layers of recursive message passing and multi-head attention. The Siamese neural network was trained for 20 epochs using 128 shuffled graph batches. The use of shuffled batches and independent validation sets ensures the model's ability to generalize across different protein families. Cross-entropy losses observed during training are shown in Figure 1a. During training, the model exhibited a consistent decrease in cross-entropy loss, reaching convergence within 20 epochs. The Matthew’s Correlation Coefficient (MCC) values obtained during each epoch are shown in Figure 1b. The GATv2-based Siamese network achieved an MCC of 0.57 during training. To verify the model’s ability to generalize to unseen biological data, we tested gPPIpred on an independent validation dataset of 72,358 protein pairs. The model achieved an MCC of 0.4641, confirming that the Siamese architecture and graph-based representations are not overfitting to specific protein families. The optimal threshold of 0.85 was determined at the point where Precision is maximized without sacrificing the model’s Sensitivity of 96% (Figure 1c). This threshold ensures the model remains a robust screening tool, capturing nearly all true interactions while maintaining an acceptable precision level. The following metrics – ROC-AUC: 0.8451, MCC: 0.4641, and accuracy: 0.6992—reflect the model’s robust classification ability (Figure 1d, Table 2).</p><p>Table 2. Validation Performance Summary. The predictive performance results of the gPPipred model across 72,980 protein pairs of the validation dataset (set threshold 0.85). <i>N</i> = number of samples.</p><table><tbody><tr><td><p><b>Target Class</b></p></td><td><p><b>Precision</b></p></td><td><p><b>Recall</b></p></td><td><p><b>F1-Score</b></p></td><td><p><b>Support (N)</b></p></td></tr><tr><td><p><b>Non-Binder</b></p></td><td><p>0.91</p></td><td><p>0.43</p></td><td><p>0.59</p></td><td><p>36,179</p></td></tr><tr><td><p><b>Binder</b></p></td><td><p>0.63</p></td><td><p>0.96</p></td><td><p>0.76</p></td><td><p>36,801</p></td></tr><tr><td><p><b>Macro Average</b></p></td><td><p>0.77</p></td><td><p>0.70</p></td><td><p>0.68</p></td><td><p>72,980</p></td></tr><tr><td><p><b>Weighted Average</b></p></td><td><p>0.77</p></td><td><p>0.70</p></td><td><p>0.68</p></td><td><p>72,980</p></td></tr></tbody></table><p></p><p>The macro average of 0.70–0.77 across these metrics confirms that the model performs robustly for both classes, despite the distinct challenges each class presents. The high recall (94%) is particularly significant biologically, as it means nearly all true biological interactions are captured. The precision of 0.65 indicates a moderate rate of false positives. The results in the confusion matrix (Figure 1e) and Table 2 show that the model effectively prioritizes true positive interactions, making it a viable computational filter for large-scale protein-protein interaction screening. This level of performance is especially notable given that the training dataset included proteins from diverse species, representing a broad spectrum of the proteome (Table 3). We offer gPPIpred in two ways: as a ready-to-use app accessible here (new link here), or by installing gPPIpred (see Extended Data section). To use the app, users only need to provide the UniProt IDs to run a query and define a threshold (Figure 1f). Once the prediction is complete, a bar chart will appear, providing the full list of preys and their interactions probabilities (Figure 1g). Additionally, in the data window, users can find and download the list of tested interactors. This list shows bait and prey IDs, interaction probability, Binds (Yes, if probability ≥ than the defined threshold; No, if probability &lt; than the defined threshold), Bait and Prey Hotspots (hotspot residues are at the interface of either bait or prey proteins and provide the bulk of the binding free energy (ΔG) for the specific interaction pair), and the full sequences of Bait and Preys (Figure 1h). </p><p>To install gPPIpred, the following dependencies are required: Python 3.8+, PyTorch, PyTorch Geometric (PyG), and BioPython. We recommend using an NVIDIA GPU with CUDA support due to the high computational demand of graph attention mechanisms; however, execution on a CPU is possible. For the full list of requirements, see the Extended Data section. When using the manually installed version, gPPIpred will generate a link redirecting to an interface like the one shown in Figure 1, making it easy to use once installed.</p><p>Despite its strengths, gPPIpred has limitations. For example, it depends on the availability of protein structural data, such as high-quality 3D structures, whether experimental or predicted, which are required to construct accurate graphs. Therefore, gPPIpred should not be used to predict interactions involving proteins that are intrinsically disordered or lack a reliable structural model, as this will yield low-confidence results. </p><p>Overall, by leveraging structural representations through graph-based learning and integrating advanced embedding techniques, gPPIpred reduces research costs associated with large-scale interaction screenings and enhances our understanding of the structural determinants of protein interactions. The detailed insights provided by residue-level predictions have important implications for studying the biological mechanisms underlying these interactions, potentially guiding experimental validation and therapeutic targeting. The practical utility of gPPIpred is further enhanced by its interpretability via Saliency Mapping. For the predicted interactions, the model successfully identified residue-level hotspots at the interaction interfaces. The residues assigned the highest saliency scores correspond to those with significant functional relevance, such as those involved in hydrogen bonding or hydrophobic packing at the interface. These \"saliency-identified\" residues provide a direct roadmap for experimentalists looking to validate predictions through point mutations. Additionally, the gPPIpred pipeline facilitates PPI predictions by offering a simplified interface that automates the download and processing of protein and compound structural files, generates interactive 3D plots to visualize putative interaction sites, and creates detailed reports.</p>","references":[{"reference":"<p>Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al., Jumper. 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630: 493-500.</p>","pubmedId":"","doi":"10.1038/s41586-024-07487-w"},{"reference":"<p>Bajar B, Wang E, Zhang S, Lin M, Chu J. 2016. A Guide to Fluorescent Protein FRET Pairs. Sensors 16: 1488.</p>","pubmedId":"","doi":"10.3390/s16091488"},{"reference":"<p>Bernett, Judith (2022). PPI prediction from sequence, gold standard dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.21591618.v3</p>","pubmedId":"","doi":"https://doi.org/10.6084/m9.figshare.21591618.v3"},{"reference":"<p>Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J. 2000. . Nature Structural Biology 7: 957-959.</p>","pubmedId":"","doi":"doi.org/10.1038/80734"},{"reference":"<p>Brody, S., Alon, U., &amp; Yahav, E.&nbsp;(2021).&nbsp;How Attentive are Graph Attention Networks?&nbsp;<i>ArXiv</i>.&nbsp;https://arxiv.org/abs/2105.14491</p>","pubmedId":"","doi":"https://arxiv.org/abs/2105.14491"},{"reference":"<p>Chatr-aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen D, et al., Tyers. 2014. The BioGRID interaction database: 2015 update. Nucleic Acids Research 43: D470-D478.</p>","pubmedId":"","doi":"10.1093/nar/gku1204"},{"reference":"<p>Fields S, Song Ok. 1989. A novel genetic system to detect protein–protein interactions. Nature 340: 245-246.</p>","pubmedId":"","doi":"10.1038/340245a0"},{"reference":"<p>Gavin AC, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, et al., Superti-Furga. 2002. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415: 141-147.</p>","pubmedId":"","doi":"10.1038/415141a"},{"reference":"<p>Guo Y, Yu L, Wen Z, Li M. 2008. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Research 36: 3025-3030.</p>","pubmedId":"","doi":"10.1093/nar/gkn159"},{"reference":"<p>Hashemifar S, Neyshabur B, Khan AA, Xu J. 2018. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics 34: i802-i810.</p>","pubmedId":"","doi":"10.1093/bioinformatics/bty573"},{"reference":"<p>Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, et al., Tyers. 2002. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415: 180-183.</p>","pubmedId":"","doi":"10.1038/415180a"},{"reference":"<p>Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. 2001. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences 98: 4569-4574.</p>","pubmedId":"","doi":"10.1073/pnas.061034498"},{"reference":"<p>Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al., Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596: 583-589.</p>","pubmedId":"","doi":"10.1038/s41586-021-03819-2"},{"reference":"<p>Kipf TN, Welling M. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. 2016 Nov 21.</p>","pubmedId":"","doi":""},{"reference":"<p>Kyte, J., &amp; Doolittle, R. F. (1982). A Simple Method for Displaying the Hydropathic Character of a Protein. In <i>J. Mol. Biol</i> (Vol. 157).</p>","pubmedId":"","doi":""},{"reference":"<p>Lee M. 2023. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 28: 5169.</p>","pubmedId":"","doi":"10.3390/molecules28135169"},{"reference":"<p>Low TY, Syafruddin SE, Mohtar MA, Vellaichamy A, A Rahman NS, Pung YF, Tan CSH. 2021. Recent progress in mass spectrometry-based strategies for elucidating protein–protein interactions. Cellular and Molecular Life Sciences 78: 5325-5339.</p>","pubmedId":"","doi":"10.1007/s00018-021-03856-0"},{"reference":"<p>Martin ACR. 2005. Mapping PDB chains to UniProtKB entries. Bioinformatics 21: 4297-4301.</p>","pubmedId":"","doi":"10.1093/bioinformatics/bti694"},{"reference":"<p>Miller KE, Kim Y, Huh WK, Park HO. 2015. Bimolecular Fluorescence Complementation (BiFC) Analysis: Advances and Recent Applications for Genome-Wide Interaction Studies. Journal of Molecular Biology 427: 2039-2055.</p>","pubmedId":"","doi":"10.1016/j.jmb.2015.03.005"},{"reference":"<p>Mrowka R, Patzak A, Herzel H. 2001. Is There a Bias in Proteome Research?. Genome Research 11: 1971-1973.</p>","pubmedId":"","doi":"doi.org/10.1101/gr.206701"},{"reference":"<p>Oughtred R, Rust J, Chang C, Breitkreutz BJ, Stark C, Willems A, et al., Tyers. 2020. The <scp>BioGRID</scp> database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Science 30: 187-200.</p>","pubmedId":"","doi":"doi.org/10.1002/pro.3978"},{"reference":"<p>Réau M, Renaud N, Xue LC, Bonvin AMJJ. 2021. DeepRank-GNN: A Graph Neural Network Framework to Learn Patterns in Protein-Protein Interfaces.  : 10.1101/2021.12.08.471762.</p>","pubmedId":"","doi":"10.1101/2021.12.08.471762"},{"reference":"<p>Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y, Jiang H. 2007. Predicting protein–protein interactions based only on sequences information. Proceedings of the National Academy of Sciences 104: 4337-4341.</p>","pubmedId":"","doi":"10.1073/pnas.0607879104"},{"reference":"<p>Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM. 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning. 12(9).</p>","pubmedId":"","doi":""},{"reference":"<p>Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, et al., Wanker. 2005. A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome. Cell 122: 957-968.</p>","pubmedId":"","doi":"10.1016/j.cell.2005.08.029"},{"reference":"<p>Sun T, Zhou B, Lai L, Pei J. 2017. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 18: 10.1186/s12859-017-1700-2.</p>","pubmedId":"","doi":"10.1186/s12859-017-1700-2"},{"reference":"<p>Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al., Mering. 2018. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research 47: D607-D613.</p>","pubmedId":"","doi":"10.1093/nar/gky1131"},{"reference":"<p>Trabuco LG, Betts MJ, Russell RB. 2012. Negative protein–protein interaction datasets derived from large-scale two-hybrid experiments. Methods 58: 343-348.</p>","pubmedId":"","doi":"10.1016/j.ymeth.2012.07.028"},{"reference":"<p>Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, et al., Velankar. 2021. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research 50: D439-D444.</p>","pubmedId":"","doi":"10.1093/nar/gkab1061"},{"reference":"<p>You ZH, Chan KCC, Hu P. 2015. Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest. PLOS ONE 10: e0125811.</p>","pubmedId":"","doi":"10.1371/journal.pone.0125811"},{"reference":"<p>You ZH, Yu JZ, Zhu L, Li S, Wen ZK. 2014. A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145: 37-43.</p>","pubmedId":"","doi":"10.1016/j.neucom.2014.05.072"},{"reference":"<p>Zhou H, Wang W, Jin J, Zheng Z, Zhou B. 2022. Graph Neural Network for Protein–Protein Interaction Prediction: A Comparative Study. Molecules 27: 6135.</p>","pubmedId":"","doi":"10.3390/molecules27186135"}],"title":"<p>gPPIpred: A User-Friendly PPI Predictor Based on Protein Molecular Graphs</p>","reviews":[{"reviewer":{"displayName":"Herlander Azevedo"},"openAcknowledgement":null,"status":{"submitted":false}}],"curatorReviews":[]},{"id":"61d89c62-7ec6-4a31-9ed0-a8fe5dc9f52f","decision":"accept","abstract":"<p>Protein–protein interactions (PPIs) govern essential cellular processes but remain challenging to characterize experimentally due to high cost and labor intensity. We present gPPIpred, a scalable computational framework leveraging graph neural networks (GNNs) and attention mechanisms to predict PPIs at residue-level resolution. Proteins are encoded as spatially informed molecular graphs integrating physicochemical features. Using curated structural datasets for training and validation, gPPIpred was fine-tuned to reliably predict positive interactions and actual interacting sites. Attention scores highlight key residues mediating interactions, offering interpretable insights to guide experimental design. gPPIpred combines high predictive performance with explainability, providing a user-friendly pipeline for large-scale PPI discovery.</p>","acknowledgements":"<p>The authors thank the BioGRID, IntAct, and PDB teams for providing invaluable datasets, and the developers of ProtBERT for enabling advanced feature extraction.</p>","authors":[{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["conceptualization","formalAnalysis","writing_reviewEditing","supervision"],"email":"matiolli@itqb.unl.pt","firstName":"Cleverson C.","lastName":"Matiolli","submittingAuthor":false,"correspondingAuthor":null,"equalContribution":true,"WBId":"","orcid":"0000-0001-8185-7628"},{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["formalAnalysis","writing_originalDraft","writing_reviewEditing","methodology"],"email":"joana.marques@itqb.unl.pt","firstName":"Joana ","lastName":"Marques","submittingAuthor":null,"correspondingAuthor":null,"equalContribution":true,"WBId":"","orcid":"0000-0002-8922-3969"},{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["conceptualization","formalAnalysis","supervision","writing_reviewEditing","fundingAcquisition"],"email":"abreu@itqb.unl.pt","firstName":"Isabel A.","lastName":"Abreu","submittingAuthor":true,"correspondingAuthor":true,"equalContribution":null,"WBId":"","orcid":"0000-0002-5566-2146"}],"awards":[],"conflictsOfInterest":"<p>The authors declare that there are no conflicts of interest present.</p>","dataTable":{"url":null},"extendedData":[{"description":"gPPIpred: A User-Friendly PPI Predictor Based on Protein Molecular Graphs","doi":"10.22002/9axzx-2y293","resourceType":"Software","name":"-gPPIpredv2_2-main 2.zip","url":"https://portal.micropublication.org/uploads/3618671af77f6426a88cda81cf677763.zip"}],"funding":"<p>We acknowledge the Portuguese Fundação para a Ciência e a Tecnologia (FCT) for a PhD fellowship for JM (<a href=\"https://doi.org/10.54499/2020.06917.BD\">https://doi.org/10.54499/2020.06917.BD</a>) and project <a href=\"https://doi.org/10.54499/PTDC/ASP-PLA/1920/2021\">https://doi.org/10.54499/PTDC/ASP-PLA/1920/2021</a>, which also supported CM contract. We also acknowledge funding by GREEN-it ‘Bioresources4sustainability’ (https://doi.org/10.54499/UIDB/04551/2020 ). The funding sources were not involved in analyses, interpretation of data, writing, or in the decision to submit this paper.</p>","image":{"url":"https://portal.micropublication.org/uploads/3c211266f5ca85f7665cd42338fca9a2.png"},"imageCaption":"<p>The gPPipred app, built in gradio, is available at: https://encr.pw/ESWd9. <b>a)</b> Model Convergence plot showing cross-entropy loss over 20 epochs. <b>b)</b> Model Performance measured in MCC over 20 epochs of training. To calculate the MCC during training a balanced test dataset (10,000 samples) was used. MCC – Matthew’s correlation coefficient. <b>c) </b>Precision and<b> </b>Sensitivity<b> </b>plot in function of interaction threshold.<b> d)</b> Receiver Operation Characteristic Curve. Final AUC-ROC: 0.9109. Optimal Threshold: 0.85. ROC - Receiver Operation Characteristic. AUC - Area Under the ROC Curve. <b>e)</b> Confusion Matrix. MCC – Matthew’s correlation coefficient. <b>f)</b> gPPipred web app. Users can input proteins by their UniProt Accession IDs. Additionally, users can define their desired threshold. <b>g)</b> Here, we show an example of an output generated by gPPIpred. Once the program is finished analysing, the status area indicates how many preys were analysed. The chart shows the probability of each interaction as well as the line for the defined threshold. The interaction is considered positive if probability ≥ than the defined threshold. <b>h)</b> Data area. Here, the users can find and download the list of the tested interactors. This list shows bait and prey IDs, the interaction probability, Binds (Yes, if probability ≥ than the defined threshold. No, probability &lt;&nbsp; than the defined threshold), Bait and Prey Hotspots (Hotspot residues are at the interface of either bait or prey proteins that provide the bulk of the binding free energy (ΔG)for the specific interaction pair), and full sequences of Bait and Preys.</p>","imageTitle":"<p><b>gPPipred: a user friendly, protein-protein interaction predictor.</b></p>","methods":"<p><b>Dataset Preparation</b></p><p>The training and validation datasets are available in the Extended Data section. Positive interaction data were curated from the Gold Standard Dataset and multi-validated experiments from BioGRID (Bernett, 2022; Chatr-Aryamontri et al., 2015; Oughtred et al., 2021; Szklarczyk et al., 2019). A common strategy for generating negative datasets is Subcellular Localization Filtering, which involves selecting proteins from different subcellular locations and labeling them as non-interacting. Although this strategy s is widely accepted, the risk of false negatives can be further reduced by using experimentally validated negative examples. Therefore, we used a negative dataset curated by Russel Lab, dataset Stelzl (2005) (Stelzl et al., 2005; Trabuco et al., 2012). For negative interactions, the shortest path between the two proteins in the underlying two-hybrid interactome is assigned a confidence score in the following format: shortestPath:2, shortestPath:3, etc., or shortestPath:NA if there is no path connecting the two proteins. We created two separate datasets, ensuring that no individual protein sequence appeared in both datasets to prevent data leakage (Table 1). Only interactions where both proteins had high-resolution structures were included. When possible, we ensured the model had examples of both positive and negative interactions for the same protein.</p><p><b>Protein Graph Construction</b></p><p>Protein graphs were constructed by representing individual residues as nodes, with edges indicating spatial proximity based on a distance threshold of less than 9.5 Å between alpha carbon atoms. This threshold provides an accurate representation of the protein's tertiary structure and local chemical environment. Preprocessing included extracting Cartesian coordinates (x, y, z) from structural files and standardizing amino acid nomenclature into single-letter codes (Berman et al., 2000). To maximize structural coverage, we integrated experimentally determined structures from the PDB with high-confidence predicted models from AlphaFold2 (Jumper et al., 2021; Varadi et al., 2022). Missing entries were retrieved via the AlphaFold API, resulting in a nearly complete structural dataset. Node features were defined by a five-dimensional vector of physicochemical properties (hydrophobicity, volume, polarizability, pI, and pKa), while edges were defined by Euclidean distances, capturing the spatial constraints essential for predicting protein-protein interactions.</p><p><b>Feature Extraction</b></p><p>Node-level embeddings were generated by assigning relative values (0 to 1) for five physicochemical properties to each amino acid (Table 1). Thus, the model recognizes that 1.0 represents the maximum expression of that specific property. Each physicochemical property provides the model with different information. Hydrophobicity indicates the tendency of an amino acid to repel water, with 1 being the most hydrophobic and 0 the least. Volume is calculated as the Van der Waals volume, where 1 corresponds to the largest and 0 to the smallest amino acid. Polarizability measures how well an amino acid can engage in Van der Waals or London dispersion forces. Mathematically, polarizability (α) is defined as α = p/E, where p is the induced dipole moment and E is the electric field. Colloquially, polarizability reflects how \"sticky\" an amino acid is. The isoelectric point (pI) is calculated as pI = (pKa1 + pKa2)/2, indicating how basic (1) or acidic (0) an amino acid is. The dissociation constant (pKa) is determined empirically and informs the model whether a residue will be protonated or deprotonated at pH 7.4. These features capture local structural motifs and inter-residue interactions, enabling the model to identify spatial dependencies critical for predicting PPIs. The residue adjacency matrices, constructed with a 9.5 Å spatial threshold, preserved topological structures and emphasized meaningful connections between residues (Kipf &amp; Welling, 2016).</p><p><b>Graph Neural Network Architecture</b></p><p>A Graph Attention Network (GATv2) architecture was employed to iteratively refine residue embeddings. Through eight layers of recursive message passing and multi-head attention, the model aggregated neighborhood information into a global graph-level representation (Brody et al., 2021). This process ensured that the final representations were robust descriptors of protein geometry and chemistry, forming the foundation for accurate interaction prediction.</p><p><b>Training and Validation</b></p><p>The gPPIpred model was implemented as a Siamese Neural Network, a dual-stream architecture designed to learn relationships between pairs of entities. The gPPIpred GATv2 was trained for 20 epochs in batches of 128 shuffled graphs (Figure 1a and b). Here, we use a Siamese neural network that employs error back-propagation during training; the networks operate in parallel and compare their outputs at the end, usually using cosine distance. The training and validation scripts can be found in the Extended Data section.</p><p><b>Interaction Site Analysis</b></p><p>To move beyond \"black-box\" predictions, we used Saliency Mapping to identify interaction \"hotspots.\" Saliency mapping calculates the gradient of the output probability with respect to the input node features. By visualizing these gradients, we can identify specific residues that contribute most to the predicted binding event. These residues typically correspond to interface regions that provide most of the binding free energy (ΔG), offering actionable targets for site-directed mutagenesis or therapeutic intervention.</p>","reagents":"<p></p>","patternDescription":"<p>Protein–protein interactions (PPIs) are fundamental to essential biological processes, including signal transduction, immune response, and metabolic regulation. Traditionally, PPIs have been characterized using low-throughput experimental methods such as Fluorescence Resonance Energy Transfer (FRET), Bimolecular Fluorescence Complementation (BiFC), and yeast two-hybrid (Y2H) systems. High-throughput approaches often use mass spectrometry-coupled techniques, including co-immunoprecipitation (Co-IP) and tandem-affinity purification (TAP) (Bajar et al., 2016; Fields &amp; Song, 1989; Gavin et al., 2002; Ho et al., 2002; Low et al., 2021; Miller et al., 2015). However, these experimental techniques are often labor-intensive, time-consuming, costly, and prone to high rates of false positives and false negatives (Ito et al., 2001; Low et al., 2021; Mrowka et al., 2001). Recently, AlphaFold has provided a crucial advance in protein structural information by greatly increasing the number of high-accuracy protein structure models (Jumper et al., 2021; Varadi et al., 2022). As a result, computational approaches have emerged as efficient, scalable alternatives for PPI prediction.</p><p>PPI prediction has relied on scalable computing techniques like machine learning (ML) and deep learning (DL) frameworks. Traditional ML techniques, such as support vector machines (SVM) and Random Forests, have been extensively used for PPI prediction (You et al., 2014, 2015). These methods typically use sequence-based features, including position-specific scoring matrices (PSSM), which capture evolutionary conservation, and physicochemical properties of amino acids to represent proteins (Guo et al., 2008; Shen et al., 2007). While effective, these methods depend on manual feature engineering, which limits scalability and hinders the integration of structural context. Advances in deep learning, particularly convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have improved performance by learning high-level features directly from protein sequences (Hashemifar et al., 2018; Sun et al., 2017). However, these models often underutilize the structural information crucial for capturing the spatial relationships in protein interactions.</p><p>Graph Neural Networks (GNNs) are a type of deep learning method that can infer information from graphically represented data. GNNs are now being used to integrate protein structural information with convolutional and recurrent networks to increase prediction robustness (Réau et al., 2021). Both Zhou et al. (2022) and Lee (2023) have summarized and compared methods that use GNN-based strategies (Lee, 2023; Zhou et al., 2022). AlphaFold represents a landmark in the application of GNNs to biological problems (Jumper et al., 2021; Varadi et al., 2022). While AlphaFold was initially used to predict the 3D structure of proteins, it can now be applied to predict complexes between proteins. Its latest update, AlphaFold3, can test interactions between a range of molecules, including DNA, RNA, ligands, ions, and proteins (Abramson et al., 2024). Notably, the model can now account for post-translational modifications and chemical modifications of nucleic acids. All of this is presented in a user-friendly environment, the AlphaFold server, where users only need to input their molecules sequences. However, it does not support high-throughput analysis, as users are limited to 30 jobs per day, and each job cannot exceed 5,000 tokens (1 amino acid =1 token).</p><p>Here, we introduce gPPIpred, a novel framework that leverages Graph Neural Networks (GNNs) with integrated attention mechanisms to simultaneously consider physicochemical properties and structural information to predict PPIs. In gPPIpred, proteins are represented as residue-level graphs, where nodes correspond to amino acid residues and edges are established based on spatial proximity within three-dimensional protein structures. Each protein is modeled as a graph, with nodes representing residues of significant structural and functional relevance. Nodes encode the following residue-level physicochemical properties: hydrophobicity, volume, polarizability, pI, and pKa (see Feature Extraction section). The specific values for each amino acid are listed in Supplemental Table 1 (Kyte &amp; Doolittle, 1982). Edges in the graph are defined using a spatial threshold. By structuring the graph around both residue properties and spatial threshold rather than individual residues, the model effectively captures biologically relevant interaction motifs. For each predicted interacting pair, interaction hotspots are calculated via saliency mapping (see Methods). Here, the saliency mapping is based on the final interaction probability, providing information on which residues in both prey and bait are critical for that specific interaction.</p><p>To iteratively refine residue embeddings, we employed a Graph Attention Network (GATv2) architecture. gPPIpred is a GATv2 model with eight layers of recursive message passing and multi-head attention. Two independent PPI datasets were created to train and validate this model. The breakdown of these datasets is in Table 1. The Siamese neural network was trained for 20 epochs using 128 shuffled graph batches. The use of shuffled batches and independent validation sets ensures the model's ability to generalize across different protein families. Cross-entropy losses observed during training are shown in Figure 1a. During training, the model exhibited a consistent decrease in cross-entropy loss, reaching convergence within 20 epochs.</p><p><b>Table 1. Number of samples contained in each dataset</b>. Number of Unique Proteins IDs in each dataset. No protein was used in both datasets to prevent data leakage. The percentage of species represented in each dataset is also listed.</p><table><tbody><tr><td><p>Dataset</p></td><td><p>Total&nbsp;</p></td><td data-colwidth=\"82\"><p>Positive&nbsp;</p><p>interactions</p></td><td><p>Negative&nbsp;interactions</p></td><td><p>Unique Proteins</p></td></tr><tr><td rowspan=\"4\"><p>Training</p></td><td rowspan=\"4\"><p>159,655 interactions:</p><p>77.87 % <i>Homo sapiens</i>,</p><p>22.05 % <i>Arabidopsis thaliana</i>,</p><p>0.05 % <i>Oryza sativa</i>,</p><p>0.02 % <i>Saccharomyces cerevisiae</i></p></td><td rowspan=\"4\" data-colwidth=\"82\"><p>77,508</p><p>(48%)</p></td><td rowspan=\"4\"><p>82,147</p><p>(52%)</p></td><td rowspan=\"4\"><p>10,349 proteins:</p><p>64 % <i>Arabidopsis thaliana,</i></p><p>35 % <i>Homo sapiens</i>,</p><p>0.8 % <i>Oryza sativa</i>,</p><p>0.2 % <i>Saccharomyces cerevisiae</i></p></td></tr><tr></tr><tr></tr><tr></tr><tr><td><p>Validation</p></td><td><p>72,358 interactions:&nbsp;</p><p>50.42 % <i>Homo sapiens</i>,</p><p>49.54&nbsp; % <i>Saccharomyces cerevisiae</i></p></td><td data-colwidth=\"82\"><p>36,179</p><p>(50%)</p></td><td><p>36,179</p><p>(50%)</p></td><td><p>4,192 proteins:</p><p>67.6 % <i>Homo sapiens</i>,</p><p>32.4 % <i>Saccharomyces cerevisiae</i></p></td></tr></tbody></table><p></p><p>The Matthew’s Correlation Coefficient (MCC) values obtained during each epoch are shown in Figure 1b. The GATv2-based Siamese network achieved an MCC of 0.57 during training. To verify the model’s ability to generalize to unseen biological data, we tested gPPIpred on an independent validation dataset of 72,358 protein pairs. The model achieved an MCC of 0.4641, confirming that the Siamese architecture and graph-based representations are not overfitting to specific protein families. The optimal threshold of 0.85 is determined by the point where Precision is maximized without sacrificing the model’s Sensitivity of 96% (Figure 1c). This threshold ensures the model remains a robust screening tool, capturing nearly all true interactions while maintaining an acceptable precision level. The performance metrics – ROC-AUC: 0.8451, MCC: 0.4641, and accuracy: 0.6992—reflect the model’s robust classification ability (Figure 1d). At a 0.85 threshold, performance differed between classes: non-binders (N = 36,179) showed high precision (0.91) but lower recall (0.43; F1 = 0.59), whereas binders (N = 36,801) showed high recall (0.96) with moderate precision (0.63; F1 = 0.76). Overall performance was balanced, with macro and weighted averages of 0.77 (precision), 0.70 (recall), and 0.68 (F1-score). The macro average of 0.70–0.77 across these metrics confirms that the model performs robustly for both classes, despite the distinct challenges each class presents. The high recall (96%) is particularly significant biologically, as it means nearly all true biological interactions are captured. The precision of 0.65 indicates a moderate rate of false positives. The confusion matrix (Figure 1e) shows that the model effectively prioritizes true-positive interactions, making it a viable computational filter for large-scale protein-protein interaction screening.</p><p>This level of performance is especially notable given that the training dataset included proteins from diverse species, representing a broad spectrum of the proteome (Table 1). We offer gPPIpred in two ways: as a ready-to-use app accessible here (new link here), or by installing gPPIpred (see Extended Data section). To use the app, users only need to provide the UniProt IDs to run a query and define a threshold (Figure 1f). Once the prediction is complete, a bar chart will appear, providing the full list of preys and their interactions probabilities (Figure 1g). Additionally, in the data window, users can find and download the list of tested interactors. This list shows bait and prey IDs, interaction probability, Binds (Yes, if probability ≥ than the defined threshold; No, if probability &lt; than the defined threshold), Bait and Prey Hotspots (hotspot residues are at the interface of either bait or prey proteins and provide the bulk of the binding free energy (ΔG) for the specific interaction pair), and the full sequences of Bait and Preys (Figure 1h).</p><p>To install gPPIpred, the following dependencies are required: Python 3.8+, PyTorch, PyTorch Geometric (PyG), and BioPython. We recommend using an NVIDIA GPU with CUDA support due to the high computational demand of graph attention mechanisms; however, execution on a CPU is possible. For the full list of requirements, see the Extended Data section. When using the manually installed version, gPPIpred will generate a link redirecting to an interface like the one shown in Figure 1, making it easy to use once installed.</p><p>Despite its strengths, gPPIpred has limitations. For example, it depends on the availability of protein structural data, such as high-quality 3D structures, whether experimental or predicted, which are required to construct accurate graphs. Therefore, gPPIpred should not be used to predict interactions involving proteins that are intrinsically disordered or lack a reliable structural model, as this will yield low-confidence results.</p><p>Overall, by leveraging structural representations through graph-based learning and integrating advanced embedding techniques, gPPIpred reduces research costs associated with large-scale interaction screenings and enhances our understanding of the structural determinants of protein interactions. The detailed insights provided by residue-level predictions have important implications for studying the biological mechanisms underlying these interactions, potentially guiding experimental validation and therapeutic targeting. The practical utility of gPPIpred is further enhanced by its interpretability via Saliency Mapping. For the predicted interactions, the model successfully identified residue-level hotspots at the interaction interfaces. The residues assigned the highest saliency scores correspond to those with significant functional relevance, such as those involved in hydrogen bonding or hydrophobic packing at the interface. These \"saliency-identified\" residues provide a direct roadmap for experimentalists looking to validate predictions through point mutations. Additionally, the gPPIpred pipeline facilitates PPI predictions by offering a simplified interface that automates the download and processing of protein and compound structural files, generates interactive 3D plots to visualize putative interaction sites, and creates detailed reports.</p>","references":[{"reference":"<p>Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al., Jumper. 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630: 493-500.</p>","pubmedId":"","doi":"10.1038/s41586-024-07487-w"},{"reference":"<p>Bajar B, Wang E, Zhang S, Lin M, Chu J. 2016. A Guide to Fluorescent Protein FRET Pairs. Sensors 16: 1488.</p>","pubmedId":"","doi":"10.3390/s16091488"},{"reference":"<p>Bernett, Judith (2022). PPI prediction from sequence, gold standard dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.21591618.v3</p>","pubmedId":"","doi":"https://doi.org/10.6084/m9.figshare.21591618.v3"},{"reference":"<p>Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J. 2000. . Nature Structural Biology 7: 957-959.</p>","pubmedId":"","doi":"doi.org/10.1038/80734"},{"reference":"<p>Brody, S., Alon, U., &amp; Yahav, E.&nbsp;(2021).&nbsp;How Attentive are Graph Attention Networks?&nbsp;<i>ArXiv</i>.&nbsp;https://arxiv.org/abs/2105.14491</p>","pubmedId":"","doi":"https://arxiv.org/abs/2105.14491"},{"reference":"<p>Chatr-aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen D, et al., Tyers. 2014. The BioGRID interaction database: 2015 update. Nucleic Acids Research 43: D470-D478.</p>","pubmedId":"","doi":"10.1093/nar/gku1204"},{"reference":"<p>Fields S, Song Ok. 1989. A novel genetic system to detect protein–protein interactions. Nature 340: 245-246.</p>","pubmedId":"","doi":"10.1038/340245a0"},{"reference":"<p>Gavin AC, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, et al., Superti-Furga. 2002. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415: 141-147.</p>","pubmedId":"","doi":"10.1038/415141a"},{"reference":"<p>Guo Y, Yu L, Wen Z, Li M. 2008. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Research 36: 3025-3030.</p>","pubmedId":"","doi":"10.1093/nar/gkn159"},{"reference":"<p>Hashemifar S, Neyshabur B, Khan AA, Xu J. 2018. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics 34: i802-i810.</p>","pubmedId":"","doi":"10.1093/bioinformatics/bty573"},{"reference":"<p>Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, et al., Tyers. 2002. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415: 180-183.</p>","pubmedId":"","doi":"10.1038/415180a"},{"reference":"<p>Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. 2001. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences 98: 4569-4574.</p>","pubmedId":"","doi":"10.1073/pnas.061034498"},{"reference":"<p>Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al., Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596: 583-589.</p>","pubmedId":"","doi":"10.1038/s41586-021-03819-2"},{"reference":"<p>Kipf TN, Welling M. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. 2016 Nov 21.</p>","pubmedId":"","doi":"10.48550/arXiv.1609.02907"},{"reference":"<p>Kyte J, Doolittle RF. 1982. A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology 157: 105-132.</p>","pubmedId":"","doi":"10.1016/0022-2836(82)90515-0"},{"reference":"<p>Lee M. 2023. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 28: 5169.</p>","pubmedId":"","doi":"10.3390/molecules28135169"},{"reference":"<p>Low TY, Syafruddin SE, Mohtar MA, Vellaichamy A, A Rahman NS, Pung YF, Tan CSH. 2021. Recent progress in mass spectrometry-based strategies for elucidating protein–protein interactions. Cellular and Molecular Life Sciences 78: 5325-5339.</p>","pubmedId":"","doi":"10.1007/s00018-021-03856-0"},{"reference":"<p>Martin ACR. 2005. Mapping PDB chains to UniProtKB entries. Bioinformatics 21: 4297-4301.</p>","pubmedId":"","doi":"10.1093/bioinformatics/bti694"},{"reference":"<p>Miller KE, Kim Y, Huh WK, Park HO. 2015. Bimolecular Fluorescence Complementation (BiFC) Analysis: Advances and Recent Applications for Genome-Wide Interaction Studies. Journal of Molecular Biology 427: 2039-2055.</p>","pubmedId":"","doi":"10.1016/j.jmb.2015.03.005"},{"reference":"<p>Mrowka R, Patzak A, Herzel H. 2001. Is There a Bias in Proteome Research?. Genome Research 11: 1971-1973.</p>","pubmedId":"","doi":"doi.org/10.1101/gr.206701"},{"reference":"<p>Oughtred R, Rust J, Chang C, Breitkreutz BJ, Stark C, Willems A, et al., Tyers. 2020. The <scp>BioGRID</scp> database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Science 30: 187-200.</p>","pubmedId":"","doi":"doi.org/10.1002/pro.3978"},{"reference":"<p>Réau M, Renaud N, Xue LC, Bonvin AMJJ. 2021. DeepRank-GNN: A Graph Neural Network Framework to Learn Patterns in Protein-Protein Interfaces.  : 10.1101/2021.12.08.471762.</p>","pubmedId":"","doi":"10.1101/2021.12.08.471762"},{"reference":"<p>Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y, Jiang H. 2007. Predicting protein–protein interactions based only on sequences information. Proceedings of the National Academy of Sciences 104: 4337-4341.</p>","pubmedId":"","doi":"10.1073/pnas.0607879104"},{"reference":"<p>Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM. 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning. 12(9).</p>","pubmedId":"","doi":""},{"reference":"<p>Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, et al., Wanker. 2005. A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome. Cell 122: 957-968.</p>","pubmedId":"","doi":"10.1016/j.cell.2005.08.029"},{"reference":"<p>Sun T, Zhou B, Lai L, Pei J. 2017. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 18: 10.1186/s12859-017-1700-2.</p>","pubmedId":"","doi":"10.1186/s12859-017-1700-2"},{"reference":"<p>Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al., Mering. 2018. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research 47: D607-D613.</p>","pubmedId":"","doi":"10.1093/nar/gky1131"},{"reference":"<p>Trabuco LG, Betts MJ, Russell RB. 2012. Negative protein–protein interaction datasets derived from large-scale two-hybrid experiments. Methods 58: 343-348.</p>","pubmedId":"","doi":"10.1016/j.ymeth.2012.07.028"},{"reference":"<p>Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, et al., Velankar. 2021. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research 50: D439-D444.</p>","pubmedId":"","doi":"10.1093/nar/gkab1061"},{"reference":"<p>You ZH, Chan KCC, Hu P. 2015. Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest. PLOS ONE 10: e0125811.</p>","pubmedId":"","doi":"10.1371/journal.pone.0125811"},{"reference":"<p>You ZH, Yu JZ, Zhu L, Li S, Wen ZK. 2014. A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145: 37-43.</p>","pubmedId":"","doi":"10.1016/j.neucom.2014.05.072"},{"reference":"<p>Zhou H, Wang W, Jin J, Zheng Z, Zhou B. 2022. Graph Neural Network for Protein–Protein Interaction Prediction: A Comparative Study. Molecules 27: 6135.</p>","pubmedId":"","doi":"10.3390/molecules27186135"}],"title":"<p>gPPIpred: A User-Friendly PPI Predictor Based on Protein Molecular Graphs</p>","reviews":[],"curatorReviews":[]},{"id":"4fcc11c5-6946-4504-a1f0-b95e4e8d82d0","decision":"publish","abstract":"<p>Protein–protein interactions (PPIs) govern essential cellular processes but remain challenging to characterize experimentally due to high cost and labor intensity. We present gPPIpred, a scalable computational framework leveraging graph neural networks (GNNs) and attention mechanisms to predict PPIs at residue-level resolution. Proteins are encoded as spatially informed molecular graphs integrating physicochemical features. Using curated structural datasets for training and validation, gPPIpred was fine-tuned to reliably predict positive interactions and actual interacting sites. Attention scores highlight key residues mediating interactions, offering interpretable insights to guide experimental design. gPPIpred combines high predictive performance with explainability, providing a user-friendly pipeline for large-scale PPI discovery.</p>","acknowledgements":"<p>The authors thank the BioGRID, IntAct, and PDB teams for providing invaluable datasets, and the developers of ProtBERT for enabling advanced feature extraction.</p>","authors":[{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["conceptualization","formalAnalysis","writing_reviewEditing","supervision"],"email":"matiolli@itqb.unl.pt","firstName":"Cleverson C.","lastName":"Matiolli","submittingAuthor":false,"correspondingAuthor":null,"equalContribution":true,"WBId":"","orcid":"0000-0001-8185-7628"},{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["formalAnalysis","writing_originalDraft","writing_reviewEditing","methodology"],"email":"joana.marques@itqb.unl.pt","firstName":"Joana ","lastName":"Marques","submittingAuthor":null,"correspondingAuthor":null,"equalContribution":true,"WBId":"","orcid":"0000-0002-8922-3969"},{"affiliations":["Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA), Avenida da República, 2780-157 Oeiras, Portugal"],"departments":[""],"credit":["conceptualization","formalAnalysis","supervision","writing_reviewEditing","fundingAcquisition"],"email":"abreu@itqb.unl.pt","firstName":"Isabel A.","lastName":"Abreu","submittingAuthor":true,"correspondingAuthor":true,"equalContribution":null,"WBId":"","orcid":"0000-0002-5566-2146"}],"awards":[],"conflictsOfInterest":"<p>The authors declare that there are no conflicts of interest present.</p>","dataTable":{"url":null},"extendedData":[{"description":"<p>This folder contains the following files: the model, Dockerfile, requirements list, app file, the gppipred notebook, training and validation datasets, and Extended data Table 1.</p>","doi":null,"resourceType":"Software","name":"-gPPIpredv2_2-main.zip","url":"https://portal.micropublication.org/uploads/4603d03a3625fb173d0d89cc6b8409a5.zip"}],"funding":"<p>We acknowledge the Portuguese Fundação para a Ciência e a Tecnologia (FCT) for a PhD fellowship for JM (<a href=\"https://doi.org/10.54499/2020.06917.BD\">https://doi.org/10.54499/2020.06917.BD</a>) and project <a href=\"https://doi.org/10.54499/PTDC/ASP-PLA/1920/2021\">https://doi.org/10.54499/PTDC/ASP-PLA/1920/2021</a>, which also supported CM contract. We also acknowledge funding by GREEN-it ‘Bioresources4sustainability’ (https://doi.org/10.54499/UIDB/04551/2020 ). The funding sources were not involved in analyses, interpretation of data, writing, or in the decision to submit this paper.</p>","image":{"url":"https://portal.micropublication.org/uploads/3c211266f5ca85f7665cd42338fca9a2.png"},"imageCaption":"<p>The gPPipred app, built in gradio, is available at: https://huggingface.co/spaces/1143Joana/gPPIpredv2_2 . <b>a)</b> Model Convergence plot showing cross-entropy loss over 20 epochs. <b>b)</b> Model Performance measured in MCC over 20 epochs of training. To calculate the MCC during training a balanced test dataset (10,000 samples) was used. MCC – Matthew’s correlation coefficient. <b>c) </b>Precision and<b> </b>Sensitivity<b> </b>plot in function of interaction threshold.<b> d)</b> Receiver Operation Characteristic Curve. Final AUC-ROC: 0.9109. Optimal Threshold: 0.85. ROC - Receiver Operation Characteristic. AUC - Area Under the ROC Curve. <b>e)</b> Confusion Matrix. MCC – Matthew’s correlation coefficient. <b>f)</b> gPPipred web app. Users can input proteins by their UniProt Accession IDs. Additionally, users can define their desired threshold. <b>g)</b> Here, we show an example of an output generated by gPPIpred. Once the program is finished analysing, the status area indicates how many preys were analysed. The chart shows the probability of each interaction as well as the line for the defined threshold. The interaction is considered positive if probability ≥ than the defined threshold. <b>h)</b> Data area. Here, the users can find and download the list of the tested interactors. This list shows bait and prey IDs, the interaction probability, Binds (Yes, if probability ≥ than the defined threshold. No, probability &lt;&nbsp; than the defined threshold), Bait and Prey Hotspots (Hotspot residues are at the interface of either bait or prey proteins that provide the bulk of the binding free energy (ΔG) for the specific interaction pair), and full sequences of Bait and Preys.</p>","imageTitle":"<p>gPPipred: a user friendly, protein-protein interaction predictor</p>","methods":"<p><b>Dataset Preparation</b></p><p>The training and validation datasets are available in the Extended Data section. Positive interaction data were curated from the Gold Standard Dataset and multi-validated experiments from BioGRID (Bernett, 2022; Chatr-Aryamontri et al., 2015; Oughtred et al., 2021; Szklarczyk et al., 2019). A common strategy for generating negative datasets is Subcellular Localization Filtering, which involves selecting proteins from different subcellular locations and labeling them as non-interacting. Although this strategy is widely accepted, the risk of false negatives can be further reduced by using experimentally validated negative examples. Therefore, we used a negative dataset curated by Russel Lab, dataset Stelzl (2005) (Stelzl et al., 2005; Trabuco et al., 2012). For negative interactions, the shortest path between the two proteins in the underlying two-hybrid interactome is assigned a confidence score in the following format: shortestPath:2, shortestPath:3, etc., or shortestPath:NA if there is no path connecting the two proteins. We created two separate datasets, ensuring that no individual protein sequence appeared in both datasets to prevent data leakage (Table 1). Only interactions where both proteins had high-resolution structures were included. When possible, we ensured the model had examples of both positive and negative interactions for the same protein.</p><p><b>Protein Graph Construction</b></p><p>Protein graphs were constructed by representing individual residues as nodes, with edges indicating spatial proximity based on a distance threshold of less than 9.5 Å between alpha carbon atoms. This threshold provides an accurate representation of the protein's tertiary structure and local chemical environment. Preprocessing included extracting Cartesian coordinates (x, y, z) from structural files and standardizing amino acid nomenclature into single-letter codes (Berman et al., 2000). To maximize structural coverage, we integrated experimentally determined structures from the PDB with high-confidence predicted models from AlphaFold2 (Jumper et al., 2021; Varadi et al., 2022). Missing entries were retrieved via the AlphaFold API, resulting in a nearly complete structural dataset. Node features were defined by a five-dimensional vector of physicochemical properties (hydrophobicity, volume, polarizability, pI, and pKa), while edges were defined by Euclidean distances, capturing the spatial constraints essential for predicting protein-protein interactions.</p><p><b>Feature Extraction</b></p><p>Node-level embeddings were generated by assigning relative values (0 to 1) for five physicochemical properties to each amino acid (Extended data Table 1). Thus, the model recognizes that 1.0 represents the maximum expression of that specific property. Each physicochemical property provides the model with different information. Hydrophobicity indicates the tendency of an amino acid to repel water, with 1 being the most hydrophobic and 0 the least. Volume is calculated as the Van der Waals volume, where 1 corresponds to the largest and 0 to the smallest amino acid. Polarizability measures how well an amino acid can engage in Van der Waals or London dispersion forces. Mathematically, polarizability (α) is defined as α = p/E, where p is the induced dipole moment and E is the electric field. Colloquially, polarizability reflects how \"sticky\" an amino acid is. The isoelectric point (pI) is calculated as pI = (pKa1 + pKa2)/2, indicating how basic (1) or acidic (0) an amino acid is. The dissociation constant (pKa) is determined empirically and informs the model whether a residue will be protonated or deprotonated at pH 7.4. These features capture local structural motifs and inter-residue interactions, enabling the model to identify spatial dependencies critical for predicting PPIs. The residue adjacency matrices, constructed with a 9.5 Å spatial threshold, preserved topological structures and emphasized meaningful connections between residues (Kipf &amp; Welling, 2016).</p><p><b>Graph Neural Network Architecture</b></p><p>A Graph Attention Network (GATv2) architecture was employed to iteratively refine residue embeddings. Through eight layers of recursive message passing and multi-head attention, the model aggregated neighborhood information into a global graph-level representation (Brody et al., 2021). This process ensured that the final representations were robust descriptors of protein geometry and chemistry, forming the foundation for accurate interaction prediction.</p><p><b>Training and Validation</b></p><p>The gPPIpred model was implemented as a Siamese Neural Network, a dual-stream architecture designed to learn relationships between pairs of entities. The gPPIpred GATv2 was trained for 20 epochs in batches of 128 shuffled graphs (Figure 1a and b). Here, we use a Siamese neural network that employs error back-propagation during training; the networks operate in parallel and compare their outputs at the end, usually using cosine distance. The training and validation scripts can be found in the Extended Data section.</p><p><b>Interaction Site Analysis</b></p><p>To move beyond \"black-box\" predictions, we used Saliency Mapping to identify interaction \"hotspots.\" Saliency mapping calculates the gradient of the output probability with respect to the input node features. By visualizing these gradients, we can identify specific residues that contribute most to the predicted binding event. These residues typically correspond to interface regions that provide most of the binding free energy (ΔG), offering actionable targets for site-directed mutagenesis or therapeutic intervention.</p>","reagents":"<p></p>","patternDescription":"<p>Protein–protein interactions (PPIs) are fundamental to essential biological processes, including signal transduction, immune response, and metabolic regulation. Traditionally, PPIs have been characterized using low-throughput experimental methods such as Fluorescence Resonance Energy Transfer (FRET), Bimolecular Fluorescence Complementation (BiFC), and yeast two-hybrid (Y2H) systems. High-throughput approaches often use mass spectrometry-coupled techniques, including co-immunoprecipitation (Co-IP) and tandem-affinity purification (TAP) (Bajar et al., 2016; Fields &amp; Song, 1989; Gavin et al., 2002; Ho et al., 2002; Low et al., 2021; Miller et al., 2015). However, these experimental techniques are often labor-intensive, time-consuming, costly, and prone to high rates of false positives and false negatives (Ito et al., 2001; Low et al., 2021; Mrowka et al., 2001). Recently, AlphaFold has provided a crucial advance in protein structural information by greatly increasing the number of high-accuracy protein structure models (Jumper et al., 2021; Varadi et al., 2022). As a result, computational approaches have emerged as efficient, scalable alternatives for PPI prediction.</p><p>PPI prediction has relied on scalable computing techniques like machine learning (ML) and deep learning (DL) frameworks. Traditional ML techniques, such as support vector machines (SVM) and Random Forests, have been extensively used for PPI prediction (You et al., 2014, 2015). These methods typically use sequence-based features, including position-specific scoring matrices (PSSM), which capture evolutionary conservation, and physicochemical properties of amino acids to represent proteins (Guo et al., 2008; Shen et al., 2007). While effective, these methods depend on manual feature engineering, which limits scalability and hinders the integration of structural context. Advances in deep learning, particularly convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have improved performance by learning high-level features directly from protein sequences (Hashemifar et al., 2018; Sun et al., 2017). However, these models often underutilize the structural information crucial for capturing the spatial relationships in protein interactions.</p><p>Graph Neural Networks (GNNs) are a type of deep learning method that can infer information from graphically represented data. GNNs are now being used to integrate protein structural information with convolutional and recurrent networks to increase prediction robustness (Réau et al., 2021). Both Zhou et al. (2022) and Lee (2023) have summarized and compared methods that use GNN-based strategies (Lee, 2023; Zhou et al., 2022). AlphaFold represents a landmark in the application of GNNs to biological problems (Jumper et al., 2021; Varadi et al., 2022). While AlphaFold was initially used to predict the 3D structure of proteins, it can now be applied to predict complexes between proteins. Its latest update, AlphaFold3, can test interactions between a range of molecules, including DNA, RNA, ligands, ions, and proteins (Abramson et al., 2024). Notably, the model can now account for post-translational modifications and chemical modifications of nucleic acids. All of this is presented in a user-friendly environment, the AlphaFold server, where users only need to input their molecules sequences. However, it does not support high-throughput analysis, as users are limited to 30 jobs per day, and each job cannot exceed 5,000 tokens (1 amino acid =1 token).</p><p>Here, we introduce gPPIpred, a novel framework that leverages Graph Neural Networks (GNNs) with integrated attention mechanisms to simultaneously consider physicochemical properties and structural information to predict PPIs. In gPPIpred, proteins are represented as residue-level graphs, where nodes correspond to amino acid residues and edges are established based on spatial proximity within three-dimensional protein structures. Each protein is modeled as a graph, with nodes representing residues of significant structural and functional relevance. Nodes encode the following residue-level physicochemical properties: hydrophobicity, volume, polarizability, pI, and pKa (see Feature Extraction section). The specific values for each amino acid are listed in Extended data  Table 1 (Kyte &amp; Doolittle, 1982). Edges in the graph are defined using a spatial threshold. By structuring the graph around both residue properties and spatial threshold rather than individual residues, the model effectively captures biologically relevant interaction motifs. For each predicted interacting pair, interaction hotspots are calculated via saliency mapping (see Methods). Here, the saliency mapping is based on the final interaction probability, providing information on which residues in both prey and bait are critical for that specific interaction.</p><p>To iteratively refine residue embeddings, we employed a Graph Attention Network (GATv2) architecture. gPPIpred is a GATv2 model with eight layers of recursive message passing and multi-head attention. Two independent PPI datasets were created to train and validate this model. The breakdown of these datasets is in Table 1. The Siamese neural network was trained for 20 epochs using 128 shuffled graph batches. The use of shuffled batches and independent validation sets ensures the model's ability to generalize across different protein families. Cross-entropy losses observed during training are shown in Figure 1a. During training, the model exhibited a consistent decrease in cross-entropy loss, reaching convergence within 20 epochs.</p><p><b>Table 1. Number of samples contained in each dataset</b>. Number of Unique Proteins IDs in each dataset. No protein was used in both datasets to prevent data leakage. The percentage of species represented in each dataset is also listed.</p><table><tbody><tr><td><p>Dataset</p></td><td><p>Total&nbsp;</p></td><td data-colwidth=\"82\"><p>Positive&nbsp;</p><p>interactions</p></td><td><p>Negative&nbsp;interactions</p></td><td><p>Unique Proteins</p></td></tr><tr><td rowspan=\"4\"><p>Training</p></td><td rowspan=\"4\"><p>159,655 interactions:</p><p>77.87 % <i>Homo sapiens</i>,</p><p>22.05 % <i>Arabidopsis thaliana</i>,</p><p>0.05 % <i>Oryza sativa</i>,</p><p>0.02 % <i>Saccharomyces cerevisiae</i></p></td><td rowspan=\"4\" data-colwidth=\"82\"><p>77,508</p><p>(48%)</p></td><td rowspan=\"4\"><p>82,147</p><p>(52%)</p></td><td rowspan=\"4\"><p>10,349 proteins:</p><p>64 % <i>Arabidopsis thaliana,</i></p><p>35 % <i>Homo sapiens</i>,</p><p>0.8 % <i>Oryza sativa</i>,</p><p>0.2 % <i>Saccharomyces cerevisiae</i></p></td></tr><tr></tr><tr></tr><tr></tr><tr><td><p>Validation</p></td><td><p>72,358 interactions:&nbsp;</p><p>50.42 % <i>Homo sapiens</i>,</p><p>49.54&nbsp; % <i>Saccharomyces cerevisiae</i></p></td><td data-colwidth=\"82\"><p>36,179</p><p>(50%)</p></td><td><p>36,179</p><p>(50%)</p></td><td><p>4,192 proteins:</p><p>67.6 % <i>Homo sapiens</i>,</p><p>32.4 % <i>Saccharomyces cerevisiae</i></p></td></tr></tbody></table><p></p><p>The Matthew’s Correlation Coefficient (MCC) values obtained during each epoch are shown in Figure 1b. The GATv2-based Siamese network achieved an MCC of 0.57 during training. To verify the model’s ability to generalize to unseen biological data, we tested gPPIpred on an independent validation dataset of 72,358 protein pairs. The model achieved an MCC of 0.4641, confirming that the Siamese architecture and graph-based representations are not overfitting to specific protein families. The optimal threshold of 0.85 is determined by the point where Precision is maximized without sacrificing the model’s Sensitivity of 96% (Figure 1c). This threshold ensures the model remains a robust screening tool, capturing nearly all true interactions while maintaining an acceptable precision level. The performance metrics – ROC-AUC: 0.8451, MCC: 0.4641, and accuracy: 0.6992—reflect the model’s robust classification ability (Figure 1d). At a 0.85 threshold, performance differed between classes: non-binders (N = 36,179) showed high precision (0.91) but lower recall (0.43; F1 = 0.59), whereas binders (N = 36,801) showed high recall (0.96) with moderate precision (0.63; F1 = 0.76). Overall performance was balanced, with macro and weighted averages of 0.77 (precision), 0.70 (recall), and 0.68 (F1-score). The macro average of 0.70–0.77 across these metrics confirms that the model performs robustly for both classes, despite the distinct challenges each class presents. The high recall (96%) is particularly significant biologically, as it means nearly all true biological interactions are captured. The precision of 0.65 indicates a moderate rate of false positives. The confusion matrix (Figure 1e) shows that the model effectively prioritizes true-positive interactions, making it a viable computational filter for large-scale protein-protein interaction screening.</p><p>This level of performance is especially notable given that the training dataset included proteins from diverse species, representing a broad spectrum of the proteome (Table 1). We offer gPPIpred in two ways: as a ready-to-use app accessible here (https://huggingface.co/spaces/1143Joana/gPPIpredv2_2), or by installing gPPIpred (see Extended Data section). To use the app, users only need to provide the UniProt IDs to run a query and define a threshold (Figure 1f). Once the prediction is complete, a bar chart will appear, providing the full list of preys and their interactions probabilities (Figure 1g). Additionally, in the data window, users can find and download the list of tested interactors. This list shows bait and prey IDs, interaction probability, Binds (Yes, if probability ≥ than the defined threshold; No, if probability &lt; than the defined threshold), Bait and Prey Hotspots (hotspot residues are at the interface of either bait or prey proteins and provide the bulk of the binding free energy (ΔG) for the specific interaction pair), and the full sequences of Bait and Preys (Figure 1h).</p><p>To install gPPIpred, the following dependencies are required: Python 3.8+, PyTorch, PyTorch Geometric (PyG), and BioPython. We recommend using an NVIDIA GPU with CUDA support due to the high computational demand of graph attention mechanisms; however, execution on a CPU is possible. For the full list of requirements, see the Extended Data section. When using the manually installed version, gPPIpred will generate a link redirecting to an interface like the one shown in Figure 1, making it easy to use once installed.</p><p>Despite its strengths, gPPIpred has limitations. For example, it depends on the availability of protein structural data, such as high-quality 3D structures, whether experimental or predicted, which are required to construct accurate graphs. Therefore, gPPIpred should not be used to predict interactions involving proteins that are intrinsically disordered or lack a reliable structural model, as this will yield low-confidence results.</p><p>Overall, by leveraging structural representations through graph-based learning and integrating advanced embedding techniques, gPPIpred reduces research costs associated with large-scale interaction screenings and enhances our understanding of the structural determinants of protein interactions. The detailed insights provided by residue-level predictions have important implications for studying the biological mechanisms underlying these interactions, potentially guiding experimental validation and therapeutic targeting. The practical utility of gPPIpred is further enhanced by its interpretability via Saliency Mapping. For the predicted interactions, the model successfully identified residue-level hotspots at the interaction interfaces. The residues assigned the highest saliency scores correspond to those with significant functional relevance, such as those involved in hydrogen bonding or hydrophobic packing at the interface. These \"saliency-identified\" residues provide a direct roadmap for experimentalists looking to validate predictions through point mutations. Additionally, the gPPIpred pipeline facilitates PPI predictions by offering a simplified interface that automates the download and processing of protein and compound structural files, generates interactive 3D plots to visualize putative interaction sites, and creates detailed reports.</p>","references":[{"reference":"<p>Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al., Jumper. 2024. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630: 493-500.</p>","pubmedId":"","doi":"10.1038/s41586-024-07487-w"},{"reference":"<p>Bajar B, Wang E, Zhang S, Lin M, Chu J. 2016. A Guide to Fluorescent Protein FRET Pairs. Sensors 16: 1488.</p>","pubmedId":"","doi":"10.3390/s16091488"},{"reference":"<p>Bernett, Judith (2022). PPI prediction from sequence, gold standard dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.21591618.v3</p>","pubmedId":"","doi":"https://doi.org/10.6084/m9.figshare.21591618.v3"},{"reference":"<p>Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J. 2000. . Nature Structural Biology 7: 957-959.</p>","pubmedId":"","doi":"doi.org/10.1038/80734"},{"reference":"<p>Brody, S., Alon, U., &amp; Yahav, E.&nbsp;(2021).&nbsp;How Attentive are Graph Attention Networks?&nbsp;<i>ArXiv</i>.&nbsp;https://arxiv.org/abs/2105.14491</p>","pubmedId":"","doi":"https://arxiv.org/abs/2105.14491"},{"reference":"<p>Chatr-aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen D, et al., Tyers. 2014. The BioGRID interaction database: 2015 update. Nucleic Acids Research 43: D470-D478.</p>","pubmedId":"","doi":"10.1093/nar/gku1204"},{"reference":"<p>Fields S, Song Ok. 1989. A novel genetic system to detect protein–protein interactions. Nature 340: 245-246.</p>","pubmedId":"","doi":"10.1038/340245a0"},{"reference":"<p>Gavin AC, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, et al., Superti-Furga. 2002. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415: 141-147.</p>","pubmedId":"","doi":"10.1038/415141a"},{"reference":"<p>Guo Y, Yu L, Wen Z, Li M. 2008. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Research 36: 3025-3030.</p>","pubmedId":"","doi":"10.1093/nar/gkn159"},{"reference":"<p>Hashemifar S, Neyshabur B, Khan AA, Xu J. 2018. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics 34: i802-i810.</p>","pubmedId":"","doi":"10.1093/bioinformatics/bty573"},{"reference":"<p>Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, et al., Tyers. 2002. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415: 180-183.</p>","pubmedId":"","doi":"10.1038/415180a"},{"reference":"<p>Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. 2001. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences 98: 4569-4574.</p>","pubmedId":"","doi":"10.1073/pnas.061034498"},{"reference":"<p>Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al., Hassabis. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596: 583-589.</p>","pubmedId":"","doi":"10.1038/s41586-021-03819-2"},{"reference":"<p>Kipf TN, Welling M. 2016. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308. 2016 Nov 21.</p>","pubmedId":"","doi":"10.48550/arXiv.1609.02907"},{"reference":"<p>Kyte J, Doolittle RF. 1982. A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology 157: 105-132.</p>","pubmedId":"","doi":"10.1016/0022-2836(82)90515-0"},{"reference":"<p>Lee M. 2023. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 28: 5169.</p>","pubmedId":"","doi":"10.3390/molecules28135169"},{"reference":"<p>Low TY, Syafruddin SE, Mohtar MA, Vellaichamy A, A Rahman NS, Pung YF, Tan CSH. 2021. Recent progress in mass spectrometry-based strategies for elucidating protein–protein interactions. Cellular and Molecular Life Sciences 78: 5325-5339.</p>","pubmedId":"","doi":"10.1007/s00018-021-03856-0"},{"reference":"<p>Martin ACR. 2005. Mapping PDB chains to UniProtKB entries. Bioinformatics 21: 4297-4301.</p>","pubmedId":"","doi":"10.1093/bioinformatics/bti694"},{"reference":"<p>Miller KE, Kim Y, Huh WK, Park HO. 2015. Bimolecular Fluorescence Complementation (BiFC) Analysis: Advances and Recent Applications for Genome-Wide Interaction Studies. Journal of Molecular Biology 427: 2039-2055.</p>","pubmedId":"","doi":"10.1016/j.jmb.2015.03.005"},{"reference":"<p>Mrowka R, Patzak A, Herzel H. 2001. Is There a Bias in Proteome Research?. Genome Research 11: 1971-1973.</p>","pubmedId":"","doi":"doi.org/10.1101/gr.206701"},{"reference":"<p>Oughtred R, Rust J, Chang C, Breitkreutz BJ, Stark C, Willems A, et al., Tyers. 2020. The <scp>BioGRID</scp> database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Science 30: 187-200.</p>","pubmedId":"","doi":"doi.org/10.1002/pro.3978"},{"reference":"<p>Réau M, Renaud N, Xue LC, Bonvin AMJJ. 2021. DeepRank-GNN: A Graph Neural Network Framework to Learn Patterns in Protein-Protein Interfaces.  : 10.1101/2021.12.08.471762.</p>","pubmedId":"","doi":"10.1101/2021.12.08.471762"},{"reference":"<p>Shen J, Zhang J, Luo X, Zhu W, Yu K, Chen K, Li Y, Jiang H. 2007. Predicting protein–protein interactions based only on sequences information. Proceedings of the National Academy of Sciences 104: 4337-4341.</p>","pubmedId":"","doi":"10.1073/pnas.0607879104"},{"reference":"<p>Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, Borgwardt KM. 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning. 12(9).</p>","pubmedId":"","doi":""},{"reference":"<p>Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, et al., Wanker. 2005. A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome. Cell 122: 957-968.</p>","pubmedId":"","doi":"10.1016/j.cell.2005.08.029"},{"reference":"<p>Sun T, Zhou B, Lai L, Pei J. 2017. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 18: 10.1186/s12859-017-1700-2.</p>","pubmedId":"","doi":"10.1186/s12859-017-1700-2"},{"reference":"<p>Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al., Mering. 2018. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research 47: D607-D613.</p>","pubmedId":"","doi":"10.1093/nar/gky1131"},{"reference":"<p>Trabuco LG, Betts MJ, Russell RB. 2012. Negative protein–protein interaction datasets derived from large-scale two-hybrid experiments. Methods 58: 343-348.</p>","pubmedId":"","doi":"10.1016/j.ymeth.2012.07.028"},{"reference":"<p>Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, et al., Velankar. 2021. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research 50: D439-D444.</p>","pubmedId":"","doi":"10.1093/nar/gkab1061"},{"reference":"<p>You ZH, Chan KCC, Hu P. 2015. Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest. PLOS ONE 10: e0125811.</p>","pubmedId":"","doi":"10.1371/journal.pone.0125811"},{"reference":"<p>You ZH, Yu JZ, Zhu L, Li S, Wen ZK. 2014. A MapReduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145: 37-43.</p>","pubmedId":"","doi":"10.1016/j.neucom.2014.05.072"},{"reference":"<p>Zhou H, Wang W, Jin J, Zheng Z, Zhou B. 2022. Graph Neural Network for Protein–Protein Interaction Prediction: A Comparative Study. Molecules 27: 6135.</p>","pubmedId":"","doi":"10.3390/molecules27186135"}],"title":"<p>gPPIpred: A User-Friendly PPI Predictor Based on Protein Molecular Graphs</p>","reviews":[],"curatorReviews":[]}]}},"species":{"species":[{"value":"acer saccharum","label":"Acer saccharum","imageSrc":"","imageAlt":"","mod":"TreeGenes","modLink":"https://treegenesdb.org","linkVariable":""},{"value":"achillea millefolium","label":"Achillea millefolium","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"acinetobacter baylyi","label":"Acinetobacter baylyi","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"actinobacteria bacterium","label":"Actinobacteria bacterium","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"adelges tsugae","label":"Adelges tsugae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"adenocaulon chilense","label":"Adenocaulon chilense","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"aedes japonicus","label":"Aedes japonicus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"aegorhinus vitulus","label":"Aegorhinus vitulus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"alaimidae","label":"Alaimidae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"allobates femoralis","label":"Allobates femoralis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"alnus glutinosa","label":"Alnus glutinosa","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"alosa aestivalis","label":"Alosa aestivalis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"alosa pseudoharengus","label":"Alosa pseudoharengus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"alternaria alternata","label":"Alternaria alternata","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"amynthas agrestis","label":"Amynthas Agrestis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ancylostoma caninum","label":"Ancylostoma caninum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ancylostoma ceylanicum","label":"Ancylostoma ceylanicum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"anemone multifida","label":"Anemone multifida","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"anguilla rostrata","label":"Anguilla rostrata","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"anisakis simplex","label":"Anisakis simplex","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"anomala albopilosa","label":"Anomala albopilosa","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"anthomyiidae sp","label":"Anthomyiidae sp","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"anthomyiidae sp","label":"Anthomyiidae sp","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"arabidopsis","label":"Arabidopsis","imageSrc":"arabidopsis.png","imageAlt":"Arabidopsis graphic by Zoe Zorn CC BY 4.0","mod":"TAIR","modLink":"https://arabidopsis.org","linkVariable":""},{"value":"architeuthis dux","label":"Architeuthis dux","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"arion vulgaris","label":"Arion vulgaris","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"armeria","label":"Armeria","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"artemia","label":"Artemia","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"arthrobacter sp.","label":"Arthrobacter sp.","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ascaridia","label":"Ascaridia","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ascaridia galli","label":"Ascaridia galli","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"asparagopsis taxiformis","label":"Asparagopsis taxiformis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"astatotilapia burtoni","label":"Astatotilapia burtoni","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"avena sativa","label":"Avena sativa","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"aves","label":"Aves","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bacillus","label":"Bacillus (firmicutes)","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bacillus cereus","label":"Bacillus cereus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bacillus mycoides","label":"Bacillus mycoides","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bacillus subtilis","label":"Bacillus subtilis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bacillus thuringiensis","label":"Bacillus thuringiensis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bacillus toyonensis","label":"Bacillus toyonensis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bacillus wiedmannii","label":"Bacillus wiedmannii","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bacteria","label":"Bacteria","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bacteriophage","label":"Bacteriophage","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bactrocera","label":"Bactrocera sp.","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"batrachospermum gelatinosum","label":"Batrachospermum gelatinosum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"betula lenta","label":"Betula lenta","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"betula nigra","label":"Betula nigra","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bombus dahlbohmii","label":"Bombus dahlbohmii","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bombus terrestris","label":"Bombus terrestris","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bombyx mori","label":"Bombyx mori","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bos taurus","label":"Bos Taurus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"brachygobius doriae","label":"Brachygobius doriae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"brassica oleracea","label":"Brassica oleracea","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"brassica rapa","label":"Brassica rapa","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"brugia malayi","label":"Brugia malayi","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"burkholderia thailandensis","label":"Burkholderia thailandensis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"buttiauxella","label":"Buttiauxella","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"caenorhabditis brenneri","label":"Caenorhabditis brenneri","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"caenorhabditis briggsae","label":"Caenorhabditis briggsae","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"c. elegans","label":"Caenorhabditis elegans","imageSrc":"c-elegans.jpg","imageAlt":"C. elegans graphic by Zoe Zorn CC BY 4.0","mod":"WormBase","modLink":"https://wormbase.org","linkVariable":""},{"value":"caenorhabditis inopinata","label":"Caenorhabditis inopinata","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"caenorhabditis japonica","label":"Caenorhabditis japonica","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"caenorhabditis nigoni","label":"Caenorhabditis nigoni","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"caenorhabditis remanei","label":"Caenorhabditis remanei","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"caenorhabditis tropicalis","label":"Caenorhabditis tropicalis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"calidifontibacillus","label":"Calidifontibacillus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"calidifontibacillus erzuremensis","label":"Calidifontibacillus erzuremensis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"calliphora sp","label":"Calliphora sp","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"caltha sagittata","label":"Caltha sagittata","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"cambarus latimanus","label":"Cambarus latimanus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"candida albicans","label":"Candida albicans","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"canis familiaris","label":"Canis familiaris","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"cannabis sativa","label":"Cannabis sativa","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"caretta caretta","label":"Caretta caretta","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"cassiopea xamachana","label":"Cassiopea xamachana","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"caulobacter vibrioides","label":"Caulobacter vibrioides","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"cephalopods","label":"Cephalopoda","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"cerastium arvense","label":"Cerastium arvense","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ceriodaphnia","label":"Ceriodaphnia","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ceroglossus suturalis","label":"Ceroglossus suturalis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"chaetoceros","label":"Chaetoceros","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"chamaecrista fasciculata","label":"Chamaecrista fasciculata","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"chilicola chalcidiformis","label":"Chilicola chalcidiformis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"chitinimonas","label":"Chitinimonas","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"chlamydomonas reinhardtii","label":"Chlamydomonas reinhardtii","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"chromobacterium","label":"Chromobacterium","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"chrysemys picta","label":"Chrysemys picta","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"chrysoperla rufilabris","label":"Chrysoperla rufilabris","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"citrus","label":"Citrus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"clavibacter sp.","label":"Clavibacter sp.","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"colinus virginianus","label":"Colinus virginianus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"crassostrea virginica","label":"Crassostrea virginica","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"crithidia fasciculata","label":"Crithidia fasciculata","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"cutibacterium acnes","label":"Cutibacterium acnes","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"cyanobacteria","label":"Cyanobacteria","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"daphnia","label":"Daphnia","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"daphnia pulex","label":"Daphnia pulex","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"diabrotica virgifera","label":"Diabrotica virgifera","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"diabrotica virgifera virgifera virus 1","label":"Diabrotica virgifera virgifera virus 1","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"d. discoideum","label":"Dictyostelium discoideum","imageSrc":"dicty.png","imageAlt":"D. discoideum","mod":"dictyBase","modLink":"http://dictybase.org","linkVariable":""},{"value":"diptera","label":"Diptera","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"dotocryptus bellicosus","label":"Dotocryptus bellicosus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"drechmeria coniospora","label":"Drechmeria coniospora","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"drosophila","label":"Drosophila","imageSrc":"drosophila.png","imageAlt":"Drosophila graphic by Zoe Zorn CC BY 4.0","mod":"FlyBase","modLink":"https://flybase.org/doi/","linkVariable":"doi"},{"value":"dryopteris campyloptera","label":"Dryopteris campyloptera","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"dryopteris expansa","label":"Dryopteris expansa","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"dryopteris intermedia","label":"Dryopteris intermedia","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"dugesia dorotocephala","label":"Dugesia dorotocephala","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"elasmobranchii","label":"Elasmobranchii","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"embryophyta","label":"Embryophyta","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"enoploteuthis chunii","label":"Enoploteuthis chunii","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"enterobacter aerogenes","label":"Enterobacter aerogenes","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"enterococcus raffinosus","label":"Enterococcus raffinosus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"epichloë coenophiala","label":"Epichloë coenophiala","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"equus caballus","label":"Equus caballus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"erigeron sp","label":"Erigeron sp","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"eristalis","label":"Eristalis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"eruca vesicaria","label":"Eruca vesicaria","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"erwinia carotovora","label":"Erwinia carotovora","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"erythronium americanum","label":"Erythronium americanum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"escherichia coli","label":"Escherichia coli","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"eukaryota","label":"Eukaryotes","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"felis catus","label":"Felis catus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"francisella novicida","label":"Francisella novicida","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"francisella tularensis","label":"Francisella tularensis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"fraxinus americana","label":"Fraxinus americana","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"fucus distichus","label":"Fucus distichus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"fungi","label":"Fungi","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"gasteropelecus sp.","label":"Gasteropelecus sp.","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"geranium sp","label":"Geranium sp","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"girardia","label":"Girardia","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"glaucomys volans","label":"Glaucomys volans","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"glycine max","label":"Glycine max","imageSrc":"","imageAlt":"","mod":"Soybase","modLink":"https://soybase.org","linkVariable":""},{"value":"glyptemys insculpta","label":"Glyptemys insculpta","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"gossypium hirsutum","label":"Gossypium hirsutum","imageSrc":"","imageAlt":"","mod":"CottonGen","modLink":"https://www.cottongen.org/","linkVariable":""},{"value":"gromphadorhina portentosa","label":"Gromphadorhina portentosa","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"gryllodes sigillatus","label":"Gryllodes sigillatus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"haliotis rufescens","label":"Haliotis rufescens","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"hepacivirus hominis","label":"Hepatitis C Virus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"herpes simplex virus type 1","label":"Herpes simplex virus type 1","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"human","label":"Human","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"human coronavirus oc43","label":"Human coronavirus OC43","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"hydra vulgaris","label":"Hydra vulgaris","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"hydropsyche sp","label":"Hydropsyche sp","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"hymenoptera","label":"Hymenoptera","imageSrc":"","imageAlt":"","mod":"Hymenoptera Genome Database","modLink":"https://hymenoptera.elsiklab.missouri.edu/","linkVariable":""},{"value":"hypochaeris radicata","label":"Hypochaeris radicata","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"hypodynerus vespiformis","label":"Hypodynerus vespiformis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"iflaviridae","label":"Iflaviridae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"iflavuris","label":"Iflavirus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ipomoea hederacea","label":"Ipomoea hederacea","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ischnomera","label":"Ischnomera","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ischnomera ruficollis","label":"Ischnomera ruficollis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"julidochromis marlieri","label":"Julidochromis marlieri","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"juniperus virginiana","label":"Juniperus virginiana","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"kluyveromyces marxianus","label":"Kluyveromyces marxianus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"l. casei","label":"L. casei","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"lacticaseibacillus casei","label":"Lacticaseibacillus casei","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"larentiinae sp","label":"Larentiinae sp","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"laurus nobilis","label":"Laurus nobilis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"lepidoptera","label":"Lepidoptera","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"leucanthemum vulgare","label":"Leucanthemum vulgare","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"linepithema humile","label":"Linepithema humile","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"liometopum occidentale","label":"Liometopum occidentale","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"lolium arundinaceum","label":"Lolium arundinaceum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"lumbriculus variegatus","label":"Lumbriculus variegatus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"lumbricus terrestris","label":"Lumbricus terrestris","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"lupinus polyphyllus","label":"Lupinus polyphyllus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"lycorma delicatula","label":"Lycorma delicatula","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"lynx rufus","label":"Lynx rufus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"magnaporthe oryzae","label":"Magnaporthe oryzae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"mammalia","label":"Mammalia","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"manihot esculenta","label":"Manihot esculenta","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"medicago lupulina","label":"Medicago lupulina","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"meloidogyne","label":"Meloidogyne","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"mimus polyglottos","label":"Mimus polyglottos","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"bryophyta","label":"Mosses","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"mouse","label":"Mouse","imageSrc":"","imageAlt":"","mod":"MGI","modLink":"https://informatics.jax.org","linkVariable":""},{"value":"m. minutoides","label":"Mus minutoides","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"mycobacterium smegmatis","label":"Mycobacterium smegmatis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"nakaseomyces glabratus","label":"Nakaseomyces glabratus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"nauphoeta cinerea","label":"Nauphoeta cinerea","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"neurospora","label":"Neurospora","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"n. benthamiana","label":"Nicotiana benthamiana","imageSrc":"","imageAlt":"","mod":"Solgenomics Network","modLink":"https://solgenomics.net/organism/Nicotiana_benthamiana/genome","linkVariable":""},{"value":"nicotiana tabacum","label":"Nicotiana tabacum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"noctuidae","label":"Noctuidae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"noctuidae sp","label":"Noctuidae sp","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"nothobranchius furzeri","label":"Nothobranchius furzeri","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"onchocerca volvulus","label":"Onchocerca volvulus","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"orconectes virilis","label":"Orconectes virilis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ormia ochracea","label":"Ormia ochracea","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"o. sativa","label":"Oryza sativa","imageSrc":"","imageAlt":"","mod":"Gramene","modLink":"https://www.gramene.org/","linkVariable":""},{"value":"other","label":"Other","imageSrc":"","imageAlt":"","mod":null,"modLink":null,"linkVariable":null},{"value":"oxalis enneaphylla","label":"Oxalis enneaphylla","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"paenarthrobacter nicotinovorans","label":"Paenarthrobacter nicotinovorans","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"paenarthrobacter nicotinovorans","label":"Paenarthrobacter nicotinovorans","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pantoea","label":"Pantoea","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pantoea agglomerans","label":"Pantoea agglomerans","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"papaver sp","label":"Papaver sp","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"paramecium bursaria","label":"Paramecium bursaria","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"partitiviridae","label":"Partitiviridae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pelodiscus sinensis","label":"Pelodiscus sinensis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"perezia recurvata","label":"Perezia recurvata","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"petromyzon marinus","label":"Petromyzon marinus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"photinus pyralis","label":"Photinus pyralis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"photinus pyralis associated partiti-like virus","label":"Photinus pyralis associated partiti-like virus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"photinus pyralis iflavirus 1","label":"Photinus pyralis iflavirus 1","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"physcomitrium patens","label":"Physcomitrium patens","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pinus strobus","label":"Pinus strobus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pinus taeda","label":"Pinus taeda","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"platycheirus","label":"Platycheirus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"plectus sambesii","label":"Plectus sambesii","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pogonomyrmex occidentalis","label":"Pogonomyrmex occidentalis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"poncirus trifoliata","label":"Poncirus trifoliata","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"populus deltoides","label":"Populus deltoides","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"potato virus y","label":"Potato virus Y","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"primula magellanica","label":"Primula magellanica","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pristionchus pacificus","label":"Pristionchus pacificus","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"prunus persica","label":"Prunus persica","imageSrc":"","imageAlt":"","mod":"Genome Database for Rosaceae","modLink":"https://www.rosaceae.org/","linkVariable":""},{"value":"psalmopoeus iriminia","label":"Psalmopoeus iriminia","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pseudanabaena sp.","label":"Pseudanabaena sp.","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pseudomonas","label":"Pseudomonas","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pseudomonas aeruginosa","label":"Pseudomonas aeruginosa","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pseudomonas glycinae","label":"Pseudomonas glycinae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pseudomonas putida","label":"Pseudomonas putida","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pseudomonas syringae","label":"Pseudomonas syringae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"pterophyllum scalare","label":"Pterophyllum scalare","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"python regius","label":"Python regius","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"quercus macrocarpa","label":"Quercus macrocarpa","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ralstonia solanacearum","label":"Ralstonia solanacearum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ranitomeya imitator","label":"Ranitomeya imitator","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ranunculus peduncularis","label":"Ranunculus peduncularis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"rat","label":"Rat","imageSrc":"","imageAlt":"","mod":"RGD","modLink":"https://rgd.mcw.edu","linkVariable":""},{"value":"rheinheimera","label":"Rheinheimera","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ribes rubrum","label":"Ribes rubrum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"sars-cov-2","label":"SARS-CoV-2","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"s. cerevisiae","label":"Saccharomyces cerevisiae","imageSrc":"yeast.png","imageAlt":"Yeast graphic by Zoe Zorn CC BY 4.0","mod":"SGD","modLink":"https://yeastgenome.org","linkVariable":""},{"value":"saccharomyces paradoxus","label":"Saccharomyces paradoxus ","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"s. uvarum","label":"Saccharomyces uvarum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"schistosoma","label":"Schistosoma","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"schizosaccharomyces japonicus","label":"Schizosaccharomyces japonicus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"s. pombe","label":"Schizosaccharomyces pombe","imageSrc":"pombe.png","imageAlt":"Pombe graphic by Zoe Zorn © Caltech","mod":"PomBase","modLink":"https://www.pombase.org/reference/PMID:","linkVariable":"pmId"},{"value":"schmidtea mediterranea","label":"Schmidtea mediterranea","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"senecio sp","label":"Senecio sp","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"simocephalus","label":"Simocephalus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"siraitia grosvenorii","label":"Siraitia grosvenorii","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"solanum lycopersicum","label":"Solanum lycopersicum","imageSrc":"","imageAlt":"","mod":"Solgenomics Network","modLink":"https://solgenomics.net/organism/1/view/","linkVariable":""},{"value":"sorghum","label":"Sorghum","imageSrc":"","imageAlt":"","mod":"SorghumBase","modLink":"https://www.sorghumbase.org","linkVariable":""},{"value":"spiroplasma eriocheiris","label":"Spiroplasma eriocheiris","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"staphylococcus aureus","label":"Staphylococcus aureus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"staphylococcus epidermidis","label":"Staphylococcus epidermidis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"steinernema carpocapsae","label":"Steinernema carpocapsae","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"https://wormbase.org","linkVariable":""},{"value":"steinernema hermaphroditum","label":"Steinernema hermaphroditum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"stenotrophomonas geniculata","label":"Stenotrophomonas geniculata","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"streptococcus gordonii ","label":"Streptococcus gordonii ","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"streptococcus mutans","label":"Streptococcus mutans","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":" streptococcus pneumoniae","label":"Streptococcus pneumoniae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"s. purpuratus","label":"Strongylocentrotus purpuratus","imageSrc":"","imageAlt":"","mod":"Echinobase","modLink":"https://www.echinobase.org","linkVariable":""},{"value":"strongyloides ratti","label":"Strongyloides ratti","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"sulfolobus","label":"Sulfolobus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"symphoricarpos albus","label":"Symphoricarpos albus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"syncirsodes","label":"Syncirsodes","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"synechococcus elongatus","label":"Synechococcus elongatus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"syrphidae","label":"Syrphidae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"tarantobelus jeffdanielsi","label":"Tarantobelus jeffdanielsi","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"taraxacum officinale","label":"Taraxacum officinale","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"tatochila theodice","label":"Tatochila theodice","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"tetrahymena","label":"Tetrahymena","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"tetramorium immigrans","label":"Tetramorium immigrans","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"tomato brown rugose fruit virus","label":"ToBRFV","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"trachemys scripta","label":"Trachemys scripta","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"tribolium castaneum","label":"Tribolium castaneum","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"trichoptera","label":"Trichoptera","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"trichuris muris","label":"Trichuris muris","imageSrc":"","imageAlt":"","mod":"WormBase","modLink":"www.wormbase.org","linkVariable":""},{"value":"trifolium repens","label":"Trifolium repens","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"trypoxylus dichotomus","label":"Trypoxylus dichotomus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"tsuga canadensis","label":"Tsuga canadensis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"ulva expansa","label":"Ulva expansa","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"universal","label":"Universal","imageSrc":"","imageAlt":"","mod":null,"modLink":null,"linkVariable":null},{"value":"vargula hilgendorfii","label":"Vargula hilgendorfii","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"vespula vulgaris","label":"Vespula vulgaris","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"virus","label":"Virus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"watasenia scintillans","label":"Watasenia scintillans","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"wolbachia pipientis","label":"Wolbachia pipientis","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"xenopus","label":"Xenopus","imageSrc":"xenopus.png","imageAlt":"Xenopus graphic by Zoe Zorn CC BY 4.0","mod":"XenBase","modLink":"https://xenbase.org","linkVariable":""},{"value":"xenorhabdus griffiniae","label":"Xenorhabdus griffiniae","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"yramea cytheris","label":"Yramea cytheris","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"zaprionus indianus","label":"Zaprionus indianus","imageSrc":"","imageAlt":"","mod":"","modLink":"","linkVariable":""},{"value":"zea mays","label":"Zea mays","imageSrc":"","imageAlt":"","mod":"MaizeGDB","modLink":"https://www.maizegdb.org","linkVariable":""},{"value":"zebrafish","label":"Zebrafish","imageSrc":"zebrafish.png","imageAlt":"Zebrafish graphic by Zoe Zorn CC BY 4.0","mod":"ZFIN","modLink":"https://zfin.org","linkVariable":""}]}},"pageContext":{"id":"7b28f659-47c2-41d4-b7f2-31f8b6c9f5b8","citedBy":[],"parsedCsv":{"csvHeader":[],"csvData":[]}}},
    "staticQueryHashes": ["2114697108"]}