examples/alphafold2.md
... ...
@@ -7,6 +7,8 @@
7 7
- [Using the default run_alphafold.sh wrapper](#using-the-default-run_alphafoldsh-wrapper)
8 8
- [Creating your own run_alphafold script](#creating-your-own-run_alphafold-script)
9 9
- [Running the python script run_alphafold.py directly](#running-the-python-script-run_alphafoldpy-directly)
10
+ - [GPU Memory:](#gpu-memory)
11
+ - [pTM scores:](#ptm-scores)
10 12
- [Examples:](#examples)
11 13
- [Web portal](#web-portal)
12 14
- [Known issues](#known-issues)
... ...
@@ -15,6 +17,8 @@
15 17
### Preparing to run Alphafold
16 18
The ALPHAFOLD2 source an implementation of the inference pipeline of AlphaFold v2.0. using a completely new model that was entered in CASP14. This is not a production application per se, but a reference that is capable of producing structures from a single amino acid sequence.
17 19
20
+From the developers' original publication: "The provided inference script is optimized for predicting the structure of a single protein, and it will compile the neural network to be specialized to exactly the size of the sequence, MSA, and templates. For large proteins, the compile time is a negligible fraction of the runtime, but it may become more significant for small proteins or if the multi-sequence alignments are already precomputed. In the bulk inference case, it may make sense to use our make_fixed_size function to pad the inputs to a uniform size, thereby reducing the number of compilations required."
21
+
18 22
The SBGrid installation of Alphafold2 does not require Docker to run, but does require a relatively recent NVidia GPU and updated driver.
19 23
20 24
AlphaFold requires a set of (large) genetic databases that must be downloaded separately. See https://github.com/deepmind/alphafold#genetic-databases for more information.
... ...
@@ -91,13 +95,23 @@ run_alphafold.sh <path to fasta file> <path to an output directory>
91 95
92 96
### Creating your own run_alphafold script
93 97
94
-You can use our run_alphafold script template here to create your own run script.
98
+You can use our run_alphafold script template here to create your own run script using the SBGrid installation of Alphafold2.
95 99
96 100
[run_alphafold_template.sh](run_alphafold_template.sh)
97 101
98 102
### Running the python script run_alphafold.py directly
99 103
run_alphafold.sh is a convenience wrapper script that shortens the required command arguments to run_alphafold.py. The run_alphafold.py script is also available which requires all parameters to be set explicitly, but provides greater flexibility. Pass --helpshort or --helpfull to see help on flags.
100 104
105
+### GPU Memory:
106
+Memory is going to be an issue with larger protein sizes. The original publication suggests some things to try:
107
+
108
+"Inferencing large proteins can easily exceed the memory of a single GPU. For a V100 with 16 GB of memory, we can predict the structure of proteins up to ~1,300 residues without ensembling and the 256- and 384-residue inference times are using a single GPU’s memory. "
109
+
110
+"The memory usage is approximately quadratic in the number of residues, so a 2,500 residue protein involves using unified memory so that we can greatly exceed the memory of a single V100. In our cloud setup, a single V100 is used for computation on a 2,500 residue protein but we requested four GPUs to have sufficient memory."
111
+
112
+### pTM scores:
113
+The pTM scores are not calculated using the default model. To get pTM scored models you need to change the model names in the input. We have provided a template wrapper script (https://sbgrid.org//wiki/examples/alphafold2) which you can change to your requirements. To get pTM scores you will need to change the model_name line to "model_1_ptm,model_2_ptm" etc.
114
+
101 115
### Examples:
102 116
103 117
We include reference sequences from CASP14 in the installation.