5b392a334ebeaae71ae74bd7fbc981cdf1809229
examples/alphafold2.md
... | ... | @@ -3,16 +3,13 @@ |
3 | 3 | <!-- TOC --> |
4 | 4 | |
5 | 5 | - [ALPHAFOLD2](#alphafold2) |
6 | - - [Preparing to run Alphafold](#preparing-to-run-alphafold) |
|
7 | - - [Using the default run_alphafold.sh wrapper](#using-the-default-run_alphafoldsh-wrapper) |
|
8 | - - [Creating your own run_alphafold script](#creating-your-own-run_alphafold-script) |
|
9 | - - [Running the python script run_alphafold.py directly](#running-the-python-script-run_alphafoldpy-directly) |
|
10 | - - [GPU Memory](#gpu-memory) |
|
11 | - - [pTM scores](#ptm-scores) |
|
12 | - - [Examples](#examples) |
|
13 | - - [Web portal](#web-portal) |
|
14 | - - [Changes in AlphaFold 2.1.0](#changes-in-alphafold-210) |
|
15 | - - [Known issues](#known-issues) |
|
6 | + - [Preparing to run Alphafold](#preparing-to-run-alphafold) |
|
7 | + - [Running the python script run_alphafold.py](#running-the-python-script-run_alphafoldpy) |
|
8 | + - [GPU Memory](#gpu-memory) |
|
9 | + - [Web portal](#web-portal) |
|
10 | + - [Changes in AlphaFold 2.1.1 multimer](#changes-in-alphafold-211-multimer) |
|
11 | + - [Examples](#examples) |
|
12 | + - [Known issues](#known-issues) |
|
16 | 13 | |
17 | 14 | <!-- /TOC --> |
18 | 15 | ### Preparing to run Alphafold |
... | ... | @@ -27,13 +24,12 @@ AlphaFold requires a set of (large) genetic databases that must be downloaded se |
27 | 24 | These databases can be downloaded with the included download script and the aria2c program, both of which are available in the SBGrid collection. Note that these databases are large in size (> 2Tb) and may require a significant amount of time to download. |
28 | 25 | |
29 | 26 | ``` |
30 | -/programs/x86_64-linux/alphafold/2.0.0/alphafold/scripts/download_all_data.sh <destination path> |
|
27 | +/programs/x86_64-linux/alphafold/2.1.1/alphafold/scripts/download_all_data.sh <destination path> |
|
31 | 28 | ``` |
32 | 29 | |
33 | 30 | The database directory shouuld look like this : |
34 | 31 | |
35 | -``` |
|
36 | -. |
|
32 | +``` |
|
37 | 33 | ├── bfd |
38 | 34 | │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata |
39 | 35 | │ ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex |
... | ... | @@ -83,43 +79,9 @@ The database directory shouuld look like this : |
83 | 79 | │ └── uniprot.fasta |
84 | 80 | ``` |
85 | 81 | |
86 | -### Using the default run_alphafold.sh wrapper |
|
87 | -Once the databases are in place, AlphaFold can be run with the wrapper script run_alphafold.sh. The default location for the databases should be `/programs/local/alphafold`, but can be changed using the ALPHAFOLD_DB variable. For example: |
|
88 | - |
|
89 | -``` |
|
90 | -export ALPHAFOLD_DB="/tmp/databases" |
|
91 | -``` |
|
92 | - |
|
93 | -specifies `/tmp/databases` as the database location in the run script in bash. |
|
94 | -tcsh users would use : |
|
95 | - |
|
96 | -``` |
|
97 | -setenv ALPHAFOLD_DB "/tmp/databases" |
|
98 | -``` |
|
99 | - |
|
100 | -To use the run script, specify the path to the fasta file and an output directy like so: |
|
101 | - |
|
102 | -``` |
|
103 | -run_alphafold.sh <path to fasta file> <path to an output directory> |
|
104 | -``` |
|
105 | - |
|
106 | -Other Useful variables used by this script : |
|
107 | - |
|
108 | -| Variable Name | Use | |
|
109 | -| ------------------ | ---------------------------------------- | |
|
110 | -| ALPHAFOLD_DB | Set alternative path to database files | |
|
111 | -| ALPHAFOLD_PTM | Use pTM models when set | |
|
112 | -| ALPHAFOLD_PRESET | use reduced_dbs or CASP14 databases | |
|
113 | -| ALPHAFOLD_TEMPLATE | date string for limiting template search | . |
|
114 | - |
|
115 | -### Creating your own run_alphafold script |
|
116 | - |
|
117 | -You can use our run_alphafold script template here to create your own run script using the SBGrid installation of Alphafold2. |
|
118 | - |
|
119 | -[run_alphafold_template.sh](run_alphafold_template.sh) |
|
120 | - |
|
121 | -### Running the python script run_alphafold.py directly |
|
122 | -run_alphafold.sh is a convenience wrapper script that shortens the required command arguments to run_alphafold.py. The run_alphafold.py script is also available which requires all parameters to be set explicitly, but provides greater flexibility. Pass --helpshort or --helpfull to see help on flags. |
|
82 | +### Running the python script run_alphafold.py |
|
83 | +The run_alphafold.py script requires all parameters to be set explicitly. |
|
84 | +Pass --helpshort or --helpfull to see help on flags. See examples below. |
|
123 | 85 | |
124 | 86 | ### GPU Memory |
125 | 87 | Memory is going to be an issue with larger protein sizes. The original publication suggests some things to try: |
... | ... | @@ -136,23 +98,12 @@ XLA_PYTHON_CLIENT_MEM_FRACTION=0.5 |
136 | 98 | XLA_PYTHON_CLIENT_ALLOCATOR=platform |
137 | 99 | ``` |
138 | 100 | *Thanks Ci Ji Lim at Wisconsin for suggesting and testing these.* |
139 | -### pTM scores |
|
140 | -The pTM scores are not calculated using the default model. To get pTM scored models you need to change the model names in the input. We have provided a template wrapper script (https://sbgrid.org//wiki/examples/alphafold2) which you can change to your requirements. To get pTM scores you will need to change the model_name line to "model_1_ptm,model_2_ptm" etc. |
|
141 | - |
|
142 | -### Examples |
|
143 | -We include reference sequences from CASP14 in the installation. |
|
144 | -This command should run successfully: |
|
145 | - |
|
146 | -``` |
|
147 | -run_alphafold.sh /programs/x86_64-linux/alphafold/2.0.0/alphafold/data/T1050.fasta |
|
148 | -``` |
|
149 | 101 | |
150 | 102 | ### Web portal |
151 | 103 | It is possible to run alphafold through a web portal. See |
152 | 104 | https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb . |
153 | 105 | |
154 | - |
|
155 | -### Changes in AlphaFold 2.1.0 |
|
106 | +### Changes in AlphaFold 2.1.1 multimer |
|
156 | 107 | On 4 Nov 2021 we added vesion 2.1.0 to the installation. This version allows prediction of multimers from fasta files containing multiple sequences. This version is not currently the default, but will be after further testing. |
157 | 108 | |
158 | 109 | To use it, set the ALPHAFOLD_X variable to 2.1.0 in the shell or in the ~/.sbgrid.conf file. |
... | ... | @@ -170,13 +121,50 @@ This is the standard SBGrid version override method. |
170 | 121 | |
171 | 122 | Some command line flags have changed since version 2.0.0. We recommend running the `run_alphafold.py` command directly and are not providing a wrapper script ( as we did for 2.0.0) at the present time. |
172 | 123 | |
173 | -Multimer example : |
|
124 | +### Examples |
|
125 | +**Standard prediction example :** |
|
126 | +``` |
|
127 | +/programs/x86_64-linux/alphafold/2.1.1/bin.capsules/run_alphafold.py \ |
|
128 | +--data_dir=/programs/local/alphafold/ \ |
|
129 | +--output_dir=/scratch/data/sbgrid/alphafold/test_monomer \ |
|
130 | +--fasta_paths=test_monomer.fasta \ |
|
131 | +--max_template_date=2020-05-14 \ |
|
132 | +--db_preset=full_dbs \ |
|
133 | +--bfd_database_path=/programs/local/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ |
|
134 | +--uniclust30_database_path=/programs/local/alphafold//uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ |
|
135 | +--uniref90_database_path=/programs/local/alphafold//uniref90/uniref90.fasta \ |
|
136 | +--mgnify_database_path=/programs/local/alphafold//mgnify/mgy_clusters_2018_12.fa \ |
|
137 | +--template_mmcif_dir=/programs/local/alphafold//pdb_mmcif/mmcif_files \ |
|
138 | +--pdb70_database_path=/programs/local/alphafold//pdb70/pdb70 \ |
|
139 | +--obsolete_pdbs_path=/programs/local/alphafold//pdb_mmcif/obsolete.dat |
|
140 | +``` |
|
141 | + |
|
142 | +where test_monomer.fasta is |
|
143 | + |
|
144 | +``` |
|
145 | +>T1083 |
|
146 | +GAMGSEIEHIEEAIANAKTKADHERLVAHYEEEAKRLEKKSEEYQELAKVYKKITDVYPNIRSYMVLHYQNLTRRYKEAAEENRALAKLHHELAIVED |
|
147 | +``` |
|
174 | 148 | |
149 | +**Multimer example :** |
|
175 | 150 | ``` |
176 | -/programs/x86_64-linux/alphafold/2.1.0/bin.capsules/run_alphafold.py --data_dir=/scratch/data/sbgrid/alphafold-2.1.0 --output_dir=/scratch/data/sbgrid/alphafold/test4 --fasta_paths=test4.fasta --max_template_date=2020-05-14 --db_preset=full_dbs --bfd_database_path=/scratch/data/sbgrid/alphafold-2.1.0/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --uniclust30_database_path=/scratch/data/sbgrid/alphafold-2.1.0/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --uniref90_database_path=/scratch/data/sbgrid/alphafold-2.1.0/uniref90/uniref90.fasta --mgnify_database_path=/scratch/data/sbgrid/alphafold-2.1.0/mgnify/mgy_clusters_2018_12.fa --template_mmcif_dir=/scratch/data/sbgrid/alphafold-2.1.0/pdb_mmcif/mmcif_files --model_preset=multimer --uniprot_database_path=/scratch/data/sbgrid/alphafold-2.1.0/uniprot/uniprot.fasta --pdb_seqres_database_path=/scratch/data/sbgrid/alphafold-2.1.0/pdb_seqres/pdb_seqres.txt --obsolete_pdbs_path=/scratch/data/sbgrid/alphafold-2.1.0/pdb_mmcif/obsolete.dat |
|
151 | +/programs/x86_64-linux/alphafold/2.1.1/bin.capsules/run_alphafold.py \ |
|
152 | +--data_dir=/programs/local/alphafold \ |
|
153 | +--output_dir=/scratch/data/sbgrid/alphafold/test_multimer \ |
|
154 | +--fasta_paths=test_multimer.fasta \ |
|
155 | +--max_template_date=2020-05-14 \ |
|
156 | +--db_preset=full_dbs \ |
|
157 | +--bfd_database_path=/programs/local/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ |
|
158 | +--uniclust30_database_path=/programs/local/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ |
|
159 | +--uniref90_database_path=/programs/local/alphafold/uniref90/uniref90.fasta \ |
|
160 | +--mgnify_database_path=/programs/local/alphafold/mgnify/mgy_clusters_2018_12.fa \ |
|
161 | +--template_mmcif_dir=/programs/local/alphafold/pdb_mmcif/mmcif_files --model_preset=multimer \ |
|
162 | +--uniprot_database_path=/programs/local/alphafold/uniprot/uniprot.fasta \ |
|
163 | +--pdb_seqres_database_path=/programs/local/alphafold/pdb_seqres/pdb_seqres.txt \ |
|
164 | +--obsolete_pdbs_path=/programs/local/alphafold/pdb_mmcif/obsolete.dat |
|
177 | 165 | ``` |
178 | 166 | |
179 | -where test4.fasta is |
|
167 | +where test_multimer.fasta is |
|
180 | 168 | |
181 | 169 | ``` |
182 | 170 | >T1083 |
... | ... | @@ -186,6 +174,7 @@ MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH |
186 | 174 | ``` |
187 | 175 | |
188 | 176 | ### Known issues |
189 | -- Some newer GPUs that require CUDA 11.1 or later may not be able to run this version of Alphafold. An update is planned that should correct this. |
|
190 | 177 | - Unified memory across GPUs does not appear to work in the current version. |
191 | 178 | - The `ptxas` executable is required to be in PATH in some cases, but not all. We can not redistribute this binary since it is part of the NVIDIA SDK. It must be installed separetely and added to the environment PATH variable, typically in `/usr/local/cuda/bin`. Version 11.0.3 works well in our hands, but other versions should work. [You can download the SDK here : https://developer.nvidia.com/cuda-toolkit-archive](https://developer.nvidia.com/cuda-toolkit-archive) |
179 | +- Clashes bewtween monomers have been reported in some cases in multimer mode |
|
180 | + |
|
... | ... | \ No newline at end of file |