Installing SBGrid Software
Using the SBGrid Environment
Support for Site Administrators
Hardware Support Notes
Getting Help
Support for Developers
In order to run AlphaFold3 with SBGrid you must:
Obtain the AF3 models
Download the AF3 databases
SBGrid cannot distribute the AF3 models. Currently, you must apply to obtain them yourself through this form:
https://forms.gle/svvpY4u2jsHEwWYS6
The link for the form is also given in the git repository
You will receive a link to download the models as a zstd compressed file. Uncompress this file to produce af3.bin. Place this file in a directory of your choice. You must supply this directory as an argument to the run_alphafold.py script.
The databases for AlphaFold3 (AF3) must be downloaded. The uncompressed files require ~1TB of space.
AlphaFold provides a script for downloading these in the github repo:
https://github.com/google-deepmind/alphafold3/blob/main/fetch_databases.sh
The version of AF3 curated in SBGrid uses the tar file version of the mmcif files in pdb_2022_09_28_mmcif_files.tar.
Since then AF3 has changed to use the uncompressed files in the mmcif tar file. The download script will untar the file.
The resulting layout should look like this:
/programs/local/alphafold-3.0.0/databases/
├── bfd-first_non_consensus_sequences.fasta
├── compressed
│ ├── bfd-first_non_consensus_sequences.fasta.zst
│ ├── mgy_clusters_2022_05.fa.zst
│ ├── nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta.zst
│ ├── pdb_2022_09_28_mmcif_files.tar.zst
│ ├── pdb_seqres_2022_09_28.fasta.zst
│ ├── rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta.zst
│ ├── rnacentral_active_seq_id_90_cov_80_linclust.fasta.zst
│ ├── uniprot_all_2021_04.fa.zst
│ └── uniref90_2022_05.fa.zst
├── mgy_clusters_2022_05.fa
├── nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta
├── pdb_2022_09_28_mmcif_files.tar
├── pdb_seqres_2022_09_28.fasta
├── rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta
├── rnacentral_active_seq_id_90_cov_80_linclust.fasta
├── uniprot_all_2021_04.fa
└── uniref90_2022_05.fa
The script below provides a minimal example. You must provide the location of the model file.
Edit PROJECT_DIR, MODEL_DIR and DATABASE_DIR as needed.
We use the input json file given in the AF3 github readme; alphafold_input.json.
#!/usr/bin/env bash
#SBATCH --partition=my-gpu-partition
#SBATCH --gres=gpu:large:1
#SBATCH --mem-per-cpu=24G
#SBATCH --time=04:00:00
#SBATCH --job-name=AF3-Example
nvidia-smi --query-gpu=name,memory.total --format=csv
## AlphaFold 3.0.0 example
## https://github.com/google-deepmind/alphafold3
##
## help@sbgrid.org
## Nov 11, 2024
## Start SBGrid environment
source /programs/sbgrid.shrc
export ALPHAFOLD_X=3.0.0
export XLA_FLAGS="--xla_gpu_enable_triton_gemm=false"
# Memory settings used for folding up to 5,120 tokens on A100 80 GB.
export XLA_PYTHON_CLIENT_PREALLOCATE=true
export XLA_CLIENT_MEM_FRACTION=0.95
PROJECT_DIR=/tmp_space/user123/alphafold3_3.0.0
MODEL_DIR=/home/user123/models
DATABASE_DIR=/programs/local/alphafold-3.0.0/databases
cd ${PROJECT_DIR}
input_json=${PROJECT_DIR}/alphafold_input.json
time run_alphafold.py \
--json_path=${input_json} \
--output_dir=${PROJECT_DIR}/af3_example_output \
--db_dir=${DATABASE_DIR} \
--model_dir=${MODEL_DIR}
{
"name": "2PV7",
"sequences": [
{
"protein": {
"id": ["A", "B"],
"sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}