Назад към всички

solublempnn

// Solubility-optimized protein sequence design using SolubleMPNN. Use this skill when: (1) Designing for E. coli expression, (2) Optimizing solubility of designed proteins, (3) Reducing aggregation propensity, (4) Need high-yield expression, (5) Avoiding inclusion body formation. For standard design,

$ git log --oneline --stat
stars:101
forks:19
updated:January 19, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
namesolublempnn
descriptionSolubility-optimized protein sequence design using SolubleMPNN. Use this skill when: (1) Designing for E. coli expression, (2) Optimizing solubility of designed proteins, (3) Reducing aggregation propensity, (4) Need high-yield expression, (5) Avoiding inclusion body formation. For standard design, use proteinmpnn. For ligand-aware design, use ligandmpnn.
licenseMIT
categorydesign-tools
tagssequence-design,inverse-folding,solubility
biomodals_scriptmodal_ligandmpnn.py

SolubleMPNN Solubility-Optimized Design

Prerequisites

RequirementMinimumRecommended
Python3.8+3.10
CUDA11.0+11.7+
GPU VRAM8GB16GB (T4)
RAM8GB16GB

How to run

First time? See Installation Guide to set up Modal and biomodals.

Option 1: Modal (recommended)

SolubleMPNN uses the ProteinMPNN Modal wrapper with soluble model:

cd biomodals
modal run modal_proteinmpnn.py \
  --pdb-path backbone.pdb \
  --num-seq-per-target 16 \
  --sampling-temp 0.1 \
  --model-name v_48_020

GPU: T4 (16GB) | Timeout: 600s default

Option 2: Local installation

git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN

# Use soluble model weights
python protein_mpnn_run.py \
  --pdb_path backbone.pdb \
  --out_folder output/ \
  --num_seq_per_target 16 \
  --sampling_temp "0.1" \
  --model_name "v_48_020"  # Soluble model

Key parameters

ParameterDefaultRangeDescription
--pdb_pathrequiredpathInput structure
--num_seq_per_target11-1000Sequences per structure
--sampling_temp"0.1""0.0001-1.0"Temperature (string!)
--model_namev_48_020stringSoluble model variant

Model Variants

ModelDescriptionUse Case
v_48_002StandardGeneral design
v_48_020Soluble-trainedE. coli expression
v_48_030High solubilityDifficult targets

Output format

output/
├── seqs/backbone.fa
└── backbone_pdb/backbone_0001.pdb

Sample output

Successful run

$ python protein_mpnn_run.py --pdb_path backbone.pdb --model_name v_48_020 --num_seq_per_target 8
Loading soluble model weights (v_48_020)...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.1 seconds

output/seqs/backbone.fa:
>backbone_0001, score=1.31, global_score=1.24, seq_recovery=0.78
MKTAYIAKQRQISFVKSHFSRQLE...
>backbone_0002, score=1.28, global_score=1.21, seq_recovery=0.81
MKTAYIAKQRQISFVKSQFSRQLD...

What good output looks like:

  • Score: 1.0-2.0 (lower = more confident)
  • Reduced hydrophobic patches compared to standard MPNN
  • Improved charge distribution

Decision tree

Should I use SolubleMPNN?
│
├─ What expression system?
│  ├─ E. coli → SolubleMPNN ✓
│  ├─ Mammalian → ProteinMPNN (PTMs matter more)
│  └─ Yeast → Either
│
├─ History of expression problems?
│  ├─ Yes, aggregation → SolubleMPNN ✓
│  ├─ Yes, low yield → SolubleMPNN ✓
│  └─ No → ProteinMPNN is fine
│
├─ What's in the binding site?
│  ├─ Small molecule / ligand → Use LigandMPNN
│  └─ Nothing / protein only → SolubleMPNN ✓
│
└─ Need highest solubility?
   ├─ Yes → Use v_48_030 model
   └─ Standard → Use v_48_020 model

Typical performance

Campaign SizeTime (T4)Cost (Modal)Notes
100 backbones × 8 seq15-20 min~$2Standard
500 backbones × 8 seq1-1.5h~$8Large campaign

Expected improvement: +15-30% solubility score vs standard ProteinMPNN.


Verify

grep -c "^>" output/seqs/*.fa  # Should match backbone_count × num_seq_per_target

Troubleshooting

Still insoluble: Try v_48_030 (higher solubility bias) Low diversity: Increase temperature to 0.2 Poor folding: Use standard ProteinMPNN and optimize later

Error interpretation

ErrorCauseFix
RuntimeError: CUDA out of memoryLong protein or large batchReduce batch_size
FileNotFoundError: v_48_020Missing model weightsDownload soluble weights

Next: Structure prediction for validation → protein-qc for filtering.