alphafold3

AlphaFold3 Example

In order to run AlphaFold3 with SBGrid you must:

Obtain the AF3 models
Download the AF3 databases
Have an NVDIA GPU with compute capability 8.0 or more

Obtain the AF3 models

SBGrid cannot distribute the AF3 models. Currently, you must apply to obtain them yourself through this form:

The link for the form is also given in the git repository

You will receive a link to download the models as a zstd compressed file. Uncompress this file to produce af3.bin. Place this file in a directory of your choice. You must supply this directory as an argument to the run_alphafold.py script.

Download databases

The databases for AlphaFold3 (AF3) must be downloaded. The uncompressed files require ~1TB of space.

AlphaFold provides a script for downloading these in the github repo:

https://github.com/google-deepmind/alphafold3/blob/main/fetch_databases.sh

The version of AF3 curated in SBGrid uses the tar file version of the mmcif files in pdb_2022_09_28_mmcif_files.tar.

Since then AF3 has changed to use the uncompressed files in the mmcif tar file. The download script will untar the file.

The resulting layout should look like this:

/programs/local/alphafold-3.0.0/databases/
├── bfd-first_non_consensus_sequences.fasta
├── compressed
│   ├── bfd-first_non_consensus_sequences.fasta.zst
│   ├── mgy_clusters_2022_05.fa.zst
│   ├── nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta.zst
│   ├── pdb_2022_09_28_mmcif_files.tar.zst
│   ├── pdb_seqres_2022_09_28.fasta.zst
│   ├── rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta.zst
│   ├── rnacentral_active_seq_id_90_cov_80_linclust.fasta.zst
│   ├── uniprot_all_2021_04.fa.zst
│   └── uniref90_2022_05.fa.zst
├── mgy_clusters_2022_05.fa
├── nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta
├── pdb_2022_09_28_mmcif_files.tar
├── pdb_seqres_2022_09_28.fasta
├── rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta
├── rnacentral_active_seq_id_90_cov_80_linclust.fasta
├── uniprot_all_2021_04.fa
└── uniref90_2022_05.fa

Run AlphaFold

The script below provides a minimal example. You must provide the location of the model file.

Edit PROJECT_DIR, MODEL_DIR and DATABASE_DIR as needed.

We use the input json file given in the AF3 github readme; alphafold_input.json.

#!/usr/bin/env bash

#SBATCH --partition=my-gpu-partition
#SBATCH --gres=gpu:large:1
#SBATCH --mem-per-cpu=24G 
#SBATCH --time=04:00:00
#SBATCH --job-name=AF3-Example

nvidia-smi --query-gpu=name,memory.total --format=csv

## AlphaFold 3.0.0 example
## https://github.com/google-deepmind/alphafold3
## 
## help@sbgrid.org
## Nov 11, 2024

## Start SBGrid environment
source /programs/sbgrid.shrc
export ALPHAFOLD_X=3.0.0

export  XLA_FLAGS="--xla_gpu_enable_triton_gemm=false"
# Memory settings used for folding up to 5,120 tokens on A100 80 GB.
export XLA_PYTHON_CLIENT_PREALLOCATE=true
export XLA_CLIENT_MEM_FRACTION=0.95

PROJECT_DIR=/tmp_space/user123/alphafold3_3.0.0
MODEL_DIR=/home/user123/models
DATABASE_DIR=/programs/local/alphafold-3.0.0/databases

cd ${PROJECT_DIR}

input_json=${PROJECT_DIR}/alphafold_input.json

time run_alphafold.py \
    --json_path=${input_json} \
    --flash_attention_implementation=xla \
    --output_dir=${PROJECT_DIR}/af3_example_output \
    --db_dir=${DATABASE_DIR} \
    --model_dir=${MODEL_DIR}

alphafold_input.json

{
  "name": "2PV7",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}