AMPHunter: Scanning Hot Springs for Antimicrobial Peptides
Our first full biome scan is complete. We screened metagenomic sequences from hot spring environments and identified 2,001 predicted antimicrobial peptide candidates.
The Pipeline
Each candidate goes through a multi-stage screening pipeline:
- smORF Extraction — Identify small open reading frames (10-50 amino acids) from metagenomic assemblies
- ESM-2 Scoring — Meta’s 650M parameter protein language model scores each sequence for AMP probability
- Biophysical Filtering — Net charge, hydrophobicity, physicochemical properties
- Safety Stack — Hemolysis prediction (HemoPi3), toxicity screening (ToxinPred3)
- Novelty Assessment — BLAST against DRAMP, APD3, and AMPSphere databases
- Structure Prediction — ESMFold and ColabFold for top candidates
Results
| Metric | Value |
|---|---|
| smORFs screened | ~120,000 |
| AMP candidates | 2,001 |
| Tier-1 Leads | 501 |
| Non-hemolytic | 1,431 |
| Database-novel | 2,001 (100%) |
| Avg AMP score | 0.9444 |
| Avg length | 37 amino acids |
Every single candidate is database-novel — they don’t match any known AMP in public databases above our identity threshold. That’s expected for metagenomic sequences from extreme environments, but it also means these are completely uncharacterized.
Top Candidates
Our top-scoring candidates have AMP probabilities above 99.99%, with favorable charge profiles (+6 to +13) and predicted non-hemolytic safety profiles. Several show interesting structural predictions from ColabFold with pLDDT scores above 70.
Caveats
⚠️ These are computational predictions only. No candidate has been synthesized or tested in a laboratory. Our AMP classifier was trained on known AMPs and may have biases. Hemolysis predictions have ~85% accuracy. Novelty scores depend on database completeness.
We share these results transparently as a starting point for further investigation — not as validated drug candidates.
What’s Next
We’re currently running the same pipeline across permafrost, glacier, abyssal ocean, and deep-sea vent metagenomes. A cross-biome comparison will follow once all scans complete.
All code and data available at github.com/Nearik42/esm2-amp