AMPHunter: Scanning Hot Springs for Antimicrobial Peptides


Our first full biome scan is complete. We screened metagenomic sequences from hot spring environments and identified 2,001 predicted antimicrobial peptide candidates.

The Pipeline

Each candidate goes through a multi-stage screening pipeline:

  1. smORF Extraction — Identify small open reading frames (10-50 amino acids) from metagenomic assemblies
  2. ESM-2 Scoring — Meta’s 650M parameter protein language model scores each sequence for AMP probability
  3. Biophysical Filtering — Net charge, hydrophobicity, physicochemical properties
  4. Safety Stack — Hemolysis prediction (HemoPi3), toxicity screening (ToxinPred3)
  5. Novelty Assessment — BLAST against DRAMP, APD3, and AMPSphere databases
  6. Structure Prediction — ESMFold and ColabFold for top candidates

Results

MetricValue
smORFs screened~120,000
AMP candidates2,001
Tier-1 Leads501
Non-hemolytic1,431
Database-novel2,001 (100%)
Avg AMP score0.9444
Avg length37 amino acids

Every single candidate is database-novel — they don’t match any known AMP in public databases above our identity threshold. That’s expected for metagenomic sequences from extreme environments, but it also means these are completely uncharacterized.

Top Candidates

Our top-scoring candidates have AMP probabilities above 99.99%, with favorable charge profiles (+6 to +13) and predicted non-hemolytic safety profiles. Several show interesting structural predictions from ColabFold with pLDDT scores above 70.

Caveats

⚠️ These are computational predictions only. No candidate has been synthesized or tested in a laboratory. Our AMP classifier was trained on known AMPs and may have biases. Hemolysis predictions have ~85% accuracy. Novelty scores depend on database completeness.

We share these results transparently as a starting point for further investigation — not as validated drug candidates.

What’s Next

We’re currently running the same pipeline across permafrost, glacier, abyssal ocean, and deep-sea vent metagenomes. A cross-biome comparison will follow once all scans complete.


All code and data available at github.com/Nearik42/esm2-amp