f8a93a7bf7015bfe2f747efa8751d27d927a25af
examples/alphafold2.md
... | ... | @@ -7,6 +7,8 @@ |
7 | 7 | - [Using the default run_alphafold.sh wrapper](#using-the-default-run_alphafoldsh-wrapper) |
8 | 8 | - [Creating your own run_alphafold script](#creating-your-own-run_alphafold-script) |
9 | 9 | - [Running the python script run_alphafold.py directly](#running-the-python-script-run_alphafoldpy-directly) |
10 | + - [GPU Memory:](#gpu-memory) |
|
11 | + - [pTM scores:](#ptm-scores) |
|
10 | 12 | - [Examples:](#examples) |
11 | 13 | - [Web portal](#web-portal) |
12 | 14 | - [Known issues](#known-issues) |
... | ... | @@ -15,6 +17,8 @@ |
15 | 17 | ### Preparing to run Alphafold |
16 | 18 | The ALPHAFOLD2 source an implementation of the inference pipeline of AlphaFold v2.0. using a completely new model that was entered in CASP14. This is not a production application per se, but a reference that is capable of producing structures from a single amino acid sequence. |
17 | 19 | |
20 | +From the developers' original publication: "The provided inference script is optimized for predicting the structure of a single protein, and it will compile the neural network to be specialized to exactly the size of the sequence, MSA, and templates. For large proteins, the compile time is a negligible fraction of the runtime, but it may become more significant for small proteins or if the multi-sequence alignments are already precomputed. In the bulk inference case, it may make sense to use our make_fixed_size function to pad the inputs to a uniform size, thereby reducing the number of compilations required." |
|
21 | + |
|
18 | 22 | The SBGrid installation of Alphafold2 does not require Docker to run, but does require a relatively recent NVidia GPU and updated driver. |
19 | 23 | |
20 | 24 | AlphaFold requires a set of (large) genetic databases that must be downloaded separately. See https://github.com/deepmind/alphafold#genetic-databases for more information. |
... | ... | @@ -91,13 +95,23 @@ run_alphafold.sh <path to fasta file> <path to an output directory> |
91 | 95 | |
92 | 96 | ### Creating your own run_alphafold script |
93 | 97 | |
94 | -You can use our run_alphafold script template here to create your own run script. |
|
98 | +You can use our run_alphafold script template here to create your own run script using the SBGrid installation of Alphafold2. |
|
95 | 99 | |
96 | 100 | [run_alphafold_template.sh](run_alphafold_template.sh) |
97 | 101 | |
98 | 102 | ### Running the python script run_alphafold.py directly |
99 | 103 | run_alphafold.sh is a convenience wrapper script that shortens the required command arguments to run_alphafold.py. The run_alphafold.py script is also available which requires all parameters to be set explicitly, but provides greater flexibility. Pass --helpshort or --helpfull to see help on flags. |
100 | 104 | |
105 | +### GPU Memory: |
|
106 | +Memory is going to be an issue with larger protein sizes. The original publication suggests some things to try: |
|
107 | + |
|
108 | +"Inferencing large proteins can easily exceed the memory of a single GPU. For a V100 with 16 GB of memory, we can predict the structure of proteins up to ~1,300 residues without ensembling and the 256- and 384-residue inference times are using a single GPU’s memory. " |
|
109 | + |
|
110 | +"The memory usage is approximately quadratic in the number of residues, so a 2,500 residue protein involves using unified memory so that we can greatly exceed the memory of a single V100. In our cloud setup, a single V100 is used for computation on a 2,500 residue protein but we requested four GPUs to have sufficient memory." |
|
111 | + |
|
112 | +### pTM scores: |
|
113 | +The pTM scores are not calculated using the default model. To get pTM scored models you need to change the model names in the input. We have provided a template wrapper script (https://sbgrid.org//wiki/examples/alphafold2) which you can change to your requirements. To get pTM scores you will need to change the model_name line to "model_1_ptm,model_2_ptm" etc. |
|
114 | + |
|
101 | 115 | ### Examples: |
102 | 116 | |
103 | 117 | We include reference sequences from CASP14 in the installation. |