alphafold3

AlphaFold3 Example

In order to run AlphaFold3 with SBGrid you must: 1. Obtain the AF3 models 2. Download the AF3 databases

Obtain the AF3 models.

SBGrid cannot distribute these. You must apply through this form:

https://forms.gle/svvpY4u2jsHEwWYS6

The link for the form is also given in the git repository

You will receive a link to download the models as a zstd compressed file. Uncompress this file to produce af3.bin. Place this file in a directory of your choice. You must supply this directory as an argument to the run_alphafold.py script.

Download databases

The databases for AlphaFold3 (AF3) must be downloaded.

AlphaFold provides a script for downloading these in the github repo:

https://github.com/google-deepmind/alphafold3/blob/main/fetch_databases.sh

The version of AF3 curated in SBGrid uses the tar file version of the mmcif files in pdb_2022_09_28_mmcif_files.tar.

Since then AF3 has changed to use the uncompressed files in the mmcif tar file. The download script will untar the file.

The resulting layout should look like this:

/programs//local/alphafold-3.0.0/databases/
├── bfd-first_non_consensus_sequences.fasta
├── compressed
│   ├── bfd-first_non_consensus_sequences.fasta.zst
│   ├── mgy_clusters_2022_05.fa.zst
│   ├── nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta.zst
│   ├── pdb_2022_09_28_mmcif_files.tar.zst
│   ├── pdb_seqres_2022_09_28.fasta.zst
│   ├── rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta.zst
│   ├── rnacentral_active_seq_id_90_cov_80_linclust.fasta.zst
│   ├── uniprot_all_2021_04.fa.zst
│   └── uniref90_2022_05.fa.zst
├── mgy_clusters_2022_05.fa
├── nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta
├── pdb_2022_09_28_mmcif_files.tar
├── pdb_seqres_2022_09_28.fasta
├── rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta
├── rnacentral_active_seq_id_90_cov_80_linclust.fasta
├── uniprot_all_2021_04.fa
└── uniref90_2022_05.fa

Run AlphaFold

The script below provides a minimal example. You must provide the location of the model file.

Edit PROJECT_DIR, MODEL_DIR and DATABASE_DIR as needed.

We use the input json file given in the AF3 github readme; alphafold_input.json.

#!/usr/bin/env bash

#SBATCH --partition=bch-gpu
#SBATCH --gres=gpu:large:1
#SBATCH --mem-per-cpu=24G 
#SBATCH --time=04:00:00
#SBATCH --job-name=JV-AF3

nvidia-smi --query-gpu=name,memory.total --format=csv

## AlphaFold 3.0.0 example
## https://github.com/google-deepmind/alphafold3
## 
## help@sbgrid.org
## Nov 11, 2024

## Start SBGrid environment
source /programs/biogrids.shrc
export ALPHAFOLD_X=3.0.0

export  XLA_FLAGS="--xla_gpu_enable_triton_gemm=false"
# Memory settings used for folding up to 5,120 tokens on A100 80 GB.
export XLA_PYTHON_CLIENT_PREALLOCATE=true
export XLA_CLIENT_MEM_FRACTION=0.95

PROJECT_DIR=/temp_work/ch199734/121503_alphafold3_3.0.0
MODEL_DIR=/home/ch199734/models
DATABASE_DIR=/programs/local/biogrids/alphafold-3.0.0/databases

cd ${PROJECT_DIR}

input_json=${PROJECT_DIR}/alphafold_input.json

time run_alphafold.py \
    --json_path=${input_json} \
    --output_dir=${PROJECT_DIR}/af3_example_output \
    --db_dir=${DATABASE_DIR} \
    --model_dir=${MODEL_DIR}  

alphafold_input.json

{
  "name": "2PV7",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}