Database introduction

PMADS workflow

We developed a literature mining pipeline using search terms derived from the dbPTM vocabulary and automated article retrieval via NCBI E-Utilities (EDirect) combined with Biopython. To broaden our corpus, we utilized the NCBI TranslationSet for thesaurus expansion of both PTM and keyword ontologies. Entity recognition was performed using PubTator and bespoke regular expressions to capture multi‑class biomedical entities, underpinned by pretrained natural language processing models (Stanza and CRAFT) for relation extraction. Candidate associations were filtered via manually curated inter‑entity rules, followed by expert review for final validation. To date, the data pipeline has processed 20,310,267 documents. Through rigorous manual curation, over 4,500 post-translational modification (PTM)-drug associations have been identified, among which more than 2,500 are cancer-related.

Workflow of PMADS

In addition to curated literature-derived associations, PMADS includes predicted PTM–drug–disease associations inferred from large-scale public proteomics data. We analyzed 50 PRIDE datasets using MaxQuant for peptide identification and PTM site quantification, retaining only sites with MaxQuant Score > 40, localization probability > 0.8, and Delta score > 40, while excluding potential contaminants and reverse sequences. Differential analysis with Wilcoxon rank-sum tests identified PTM changes with Benjamini–Hochberg–adjusted p < 0.05, alongside changes with unadjusted p < 0.05 and absolute log₂ fold change> 10, the latter classified as low-confidence. The current release contains over 43,800 inferred associations spanning 7,878 proteins, 24,195 PTM types, 39 drugs, and 18 disease types, primarily in cancer.

PTM types in PMADS

PTM types in PMADS

Diseases in PMADS

Disease category

Key features

AI-Driven Literature Mining

Curated from 2+ million biomedical publications using advanced natural language processing (NLP) and large language models (LLMs), with dual validation by domain experts.

Multi-Omics Integration

Integrates experimental mass spectrometry data from PRIDE Database (via rigorous computational pipelines) with text-mined PTM-drug associations. Provides residue-level PTM sites, regulatory pathways, and drug response correlations.

Clinically Actionable Insights

Annotates 4,500+ experimentally validated PTM-drug relationships and 500+ cross-species modification sites linked to 1000+ therapeutic agents. Supports biomarker discovery, drug repositioning, and therapy response prediction.

Dynamic Knowledge Expansion

Continuously updated with literature surveillance and AI-predicted PTM-drug interactions.

Applications

Basic Research: Elucidate molecular networks of PTM-regulated drug mechanisms.

Drug Development: Guide targeted drug design by mapping "druggable" PTM hotspots.

Precision Medicine: Identify patient-specific PTM signatures for personalized treatment optimization.