examples/alphafold2.md
... ...
@@ -3,16 +3,13 @@
3 3
<!-- TOC -->
4 4
5 5
- [ALPHAFOLD2](#alphafold2)
6
- - [Preparing to run Alphafold](#preparing-to-run-alphafold)
7
- - [Using the default run_alphafold.sh wrapper](#using-the-default-run_alphafoldsh-wrapper)
8
- - [Creating your own run_alphafold script](#creating-your-own-run_alphafold-script)
9
- - [Running the python script run_alphafold.py directly](#running-the-python-script-run_alphafoldpy-directly)
10
- - [GPU Memory](#gpu-memory)
11
- - [pTM scores](#ptm-scores)
12
- - [Examples](#examples)
13
- - [Web portal](#web-portal)
14
- - [Changes in AlphaFold 2.1.0](#changes-in-alphafold-210)
15
- - [Known issues](#known-issues)
6
+ - [Preparing to run Alphafold](#preparing-to-run-alphafold)
7
+ - [Running the python script run_alphafold.py](#running-the-python-script-run_alphafoldpy)
8
+ - [GPU Memory](#gpu-memory)
9
+ - [Web portal](#web-portal)
10
+ - [Changes in AlphaFold 2.1.1 multimer](#changes-in-alphafold-211-multimer)
11
+ - [Examples](#examples)
12
+ - [Known issues](#known-issues)
16 13
17 14
<!-- /TOC -->
18 15
### Preparing to run Alphafold
... ...
@@ -27,13 +24,12 @@ AlphaFold requires a set of (large) genetic databases that must be downloaded se
27 24
These databases can be downloaded with the included download script and the aria2c program, both of which are available in the SBGrid collection. Note that these databases are large in size (> 2Tb) and may require a significant amount of time to download.
28 25
29 26
```
30
-/programs/x86_64-linux/alphafold/2.0.0/alphafold/scripts/download_all_data.sh <destination path>
27
+/programs/x86_64-linux/alphafold/2.1.1/alphafold/scripts/download_all_data.sh <destination path>
31 28
```
32 29
33 30
The database directory shouuld look like this :
34 31
35
-```
36
-.
32
+```
37 33
├── bfd
38 34
│   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata
39 35
│   ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex
... ...
@@ -83,43 +79,9 @@ The database directory shouuld look like this :
83 79
│   └── uniprot.fasta
84 80
```
85 81
86
-### Using the default run_alphafold.sh wrapper
87
-Once the databases are in place, AlphaFold can be run with the wrapper script run_alphafold.sh. The default location for the databases should be `/programs/local/alphafold`, but can be changed using the ALPHAFOLD_DB variable. For example:
88
-
89
-```
90
-export ALPHAFOLD_DB="/tmp/databases"
91
-```
92
-
93
-specifies `/tmp/databases` as the database location in the run script in bash.
94
-tcsh users would use :
95
-
96
-```
97
-setenv ALPHAFOLD_DB "/tmp/databases"
98
-```
99
-
100
-To use the run script, specify the path to the fasta file and an output directy like so:
101
-
102
-```
103
-run_alphafold.sh <path to fasta file> <path to an output directory>
104
-```
105
-
106
-Other Useful variables used by this script :
107
-
108
-| Variable Name | Use |
109
-| ------------------ | ---------------------------------------- |
110
-| ALPHAFOLD_DB | Set alternative path to database files |
111
-| ALPHAFOLD_PTM | Use pTM models when set |
112
-| ALPHAFOLD_PRESET | use reduced_dbs or CASP14 databases |
113
-| ALPHAFOLD_TEMPLATE | date string for limiting template search | .
114
-
115
-### Creating your own run_alphafold script
116
-
117
-You can use our run_alphafold script template here to create your own run script using the SBGrid installation of Alphafold2.
118
-
119
-[run_alphafold_template.sh](run_alphafold_template.sh)
120
-
121
-### Running the python script run_alphafold.py directly
122
-run_alphafold.sh is a convenience wrapper script that shortens the required command arguments to run_alphafold.py. The run_alphafold.py script is also available which requires all parameters to be set explicitly, but provides greater flexibility. Pass --helpshort or --helpfull to see help on flags.
82
+### Running the python script run_alphafold.py
83
+The run_alphafold.py script requires all parameters to be set explicitly.
84
+Pass --helpshort or --helpfull to see help on flags. See examples below.
123 85
124 86
### GPU Memory
125 87
Memory is going to be an issue with larger protein sizes. The original publication suggests some things to try:
... ...
@@ -136,23 +98,12 @@ XLA_PYTHON_CLIENT_MEM_FRACTION=0.5
136 98
XLA_PYTHON_CLIENT_ALLOCATOR=platform
137 99
```
138 100
*Thanks Ci Ji Lim at Wisconsin for suggesting and testing these.*
139
-### pTM scores
140
-The pTM scores are not calculated using the default model. To get pTM scored models you need to change the model names in the input. We have provided a template wrapper script (https://sbgrid.org//wiki/examples/alphafold2) which you can change to your requirements. To get pTM scores you will need to change the model_name line to "model_1_ptm,model_2_ptm" etc.
141
-
142
-### Examples
143
-We include reference sequences from CASP14 in the installation.
144
-This command should run successfully:
145
-
146
-```
147
-run_alphafold.sh /programs/x86_64-linux/alphafold/2.0.0/alphafold/data/T1050.fasta
148
-```
149 101
150 102
### Web portal
151 103
It is possible to run alphafold through a web portal. See
152 104
https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb .
153 105
154
-
155
-### Changes in AlphaFold 2.1.0
106
+### Changes in AlphaFold 2.1.1 multimer
156 107
On 4 Nov 2021 we added vesion 2.1.0 to the installation. This version allows prediction of multimers from fasta files containing multiple sequences. This version is not currently the default, but will be after further testing.
157 108
158 109
To use it, set the ALPHAFOLD_X variable to 2.1.0 in the shell or in the ~/.sbgrid.conf file.
... ...
@@ -170,13 +121,50 @@ This is the standard SBGrid version override method.
170 121
171 122
Some command line flags have changed since version 2.0.0. We recommend running the `run_alphafold.py` command directly and are not providing a wrapper script ( as we did for 2.0.0) at the present time.
172 123
173
-Multimer example :
124
+### Examples
125
+**Standard prediction example :**
126
+```
127
+/programs/x86_64-linux/alphafold/2.1.1/bin.capsules/run_alphafold.py \
128
+--data_dir=/programs/local/alphafold/ \
129
+--output_dir=/scratch/data/sbgrid/alphafold/test_monomer \
130
+--fasta_paths=test_monomer.fasta \
131
+--max_template_date=2020-05-14 \
132
+--db_preset=full_dbs \
133
+--bfd_database_path=/programs/local/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
134
+--uniclust30_database_path=/programs/local/alphafold//uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
135
+--uniref90_database_path=/programs/local/alphafold//uniref90/uniref90.fasta \
136
+--mgnify_database_path=/programs/local/alphafold//mgnify/mgy_clusters_2018_12.fa \
137
+--template_mmcif_dir=/programs/local/alphafold//pdb_mmcif/mmcif_files \
138
+--pdb70_database_path=/programs/local/alphafold//pdb70/pdb70 \
139
+--obsolete_pdbs_path=/programs/local/alphafold//pdb_mmcif/obsolete.dat
140
+```
141
+
142
+where test_monomer.fasta is
143
+
144
+```
145
+>T1083
146
+GAMGSEIEHIEEAIANAKTKADHERLVAHYEEEAKRLEKKSEEYQELAKVYKKITDVYPNIRSYMVLHYQNLTRRYKEAAEENRALAKLHHELAIVED
147
+```
174 148
149
+**Multimer example :**
175 150
```
176
-/programs/x86_64-linux/alphafold/2.1.0/bin.capsules/run_alphafold.py --data_dir=/scratch/data/sbgrid/alphafold-2.1.0 --output_dir=/scratch/data/sbgrid/alphafold/test4 --fasta_paths=test4.fasta --max_template_date=2020-05-14 --db_preset=full_dbs --bfd_database_path=/scratch/data/sbgrid/alphafold-2.1.0/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --uniclust30_database_path=/scratch/data/sbgrid/alphafold-2.1.0/uniclust30/uniclust30_2018_08/uniclust30_2018_08 --uniref90_database_path=/scratch/data/sbgrid/alphafold-2.1.0/uniref90/uniref90.fasta --mgnify_database_path=/scratch/data/sbgrid/alphafold-2.1.0/mgnify/mgy_clusters_2018_12.fa --template_mmcif_dir=/scratch/data/sbgrid/alphafold-2.1.0/pdb_mmcif/mmcif_files --model_preset=multimer --uniprot_database_path=/scratch/data/sbgrid/alphafold-2.1.0/uniprot/uniprot.fasta --pdb_seqres_database_path=/scratch/data/sbgrid/alphafold-2.1.0/pdb_seqres/pdb_seqres.txt --obsolete_pdbs_path=/scratch/data/sbgrid/alphafold-2.1.0/pdb_mmcif/obsolete.dat
151
+/programs/x86_64-linux/alphafold/2.1.1/bin.capsules/run_alphafold.py \
152
+--data_dir=/programs/local/alphafold \
153
+--output_dir=/scratch/data/sbgrid/alphafold/test_multimer \
154
+--fasta_paths=test_multimer.fasta \
155
+--max_template_date=2020-05-14 \
156
+--db_preset=full_dbs \
157
+--bfd_database_path=/programs/local/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
158
+--uniclust30_database_path=/programs/local/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
159
+--uniref90_database_path=/programs/local/alphafold/uniref90/uniref90.fasta \
160
+--mgnify_database_path=/programs/local/alphafold/mgnify/mgy_clusters_2018_12.fa \
161
+--template_mmcif_dir=/programs/local/alphafold/pdb_mmcif/mmcif_files --model_preset=multimer \
162
+--uniprot_database_path=/programs/local/alphafold/uniprot/uniprot.fasta \
163
+--pdb_seqres_database_path=/programs/local/alphafold/pdb_seqres/pdb_seqres.txt \
164
+--obsolete_pdbs_path=/programs/local/alphafold/pdb_mmcif/obsolete.dat
177 165
```
178 166
179
-where test4.fasta is
167
+where test_multimer.fasta is
180 168
181 169
```
182 170
>T1083
... ...
@@ -186,6 +174,7 @@ MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH
186 174
```
187 175
188 176
### Known issues
189
-- Some newer GPUs that require CUDA 11.1 or later may not be able to run this version of Alphafold. An update is planned that should correct this.
190 177
- Unified memory across GPUs does not appear to work in the current version.
191 178
- The `ptxas` executable is required to be in PATH in some cases, but not all. We can not redistribute this binary since it is part of the NVIDIA SDK. It must be installed separetely and added to the environment PATH variable, typically in `/usr/local/cuda/bin`. Version 11.0.3 works well in our hands, but other versions should work. [You can download the SDK here : https://developer.nvidia.com/cuda-toolkit-archive](https://developer.nvidia.com/cuda-toolkit-archive)
179
+- Clashes bewtween monomers have been reported in some cases in multimer mode
180
+
... ...
\ No newline at end of file