Changes in f8a93a7: alphafold - notes on Memory, pTM scores

				examples/alphafold2.md
			
          @@ -7,6 +7,8 @@

           	- [Using the default run_alphafold.sh wrapper](#using-the-default-run_alphafoldsh-wrapper)

           	- [Creating your own run_alphafold script](#creating-your-own-run_alphafold-script)

           	- [Running the python script run_alphafold.py directly](#running-the-python-script-run_alphafoldpy-directly)

          +	- [GPU Memory:](#gpu-memory)

          +	- [pTM scores:](#ptm-scores)

           	- [Examples:](#examples)

           	- [Web portal](#web-portal)

           	- [Known issues](#known-issues)

          @@ -15,6 +17,8 @@

           ### Preparing to run Alphafold

           The ALPHAFOLD2 source an implementation of the inference pipeline of AlphaFold v2.0. using a completely new model that was entered in CASP14. This is not a production application per se, but a reference that is capable of producing structures from a single amino acid sequence. 

          +From the developers' original publication: "The provided inference script is optimized for predicting the structure of a single protein, and it will compile the neural network to be specialized to exactly the size of the sequence, MSA, and templates. For large proteins, the compile time is a negligible fraction of the runtime, but it may become more significant for small proteins or if the multi-sequence alignments are already precomputed. In the bulk inference case, it may make sense to use our make_fixed_size function to pad the inputs to a uniform size, thereby reducing the number of compilations required."

          +

           The SBGrid installation of Alphafold2 does not require Docker to run, but does require a relatively recent NVidia GPU and updated driver.

           AlphaFold requires a set of (large) genetic databases that must be downloaded separately. See https://github.com/deepmind/alphafold#genetic-databases for more information.

          @@ -91,13 +95,23 @@ run_alphafold.sh <path to fasta file> <path to an output directory>

           ### Creating your own run_alphafold script

          -You can use our run_alphafold script template here to create your own run script.

          +You can use our run_alphafold script template here to create your own run script using the SBGrid installation of Alphafold2.

           [run_alphafold_template.sh](run_alphafold_template.sh)

           ### Running the python script run_alphafold.py directly

           run_alphafold.sh is a convenience wrapper script that shortens the required command arguments to run_alphafold.py. The run_alphafold.py script is also available which requires all parameters to be set explicitly, but provides greater flexibility. Pass --helpshort or --helpfull to see help on flags.

          +### GPU Memory:

          +Memory is going to be an issue with larger protein sizes. The original publication suggests some things to try:

          +

          +"Inferencing large proteins can easily exceed the memory of a single GPU. For a V100 with 16 GB of memory, we can predict the structure of proteins up to ~1,300 residues without ensembling and the 256- and 384-residue inference times are using a single GPU’s memory. "

          +

          +"The memory usage is approximately quadratic in the number of residues, so a 2,500 residue protein involves using unified memory so that we can greatly exceed the memory of a single V100. In our cloud setup, a single V100 is used for computation on a 2,500 residue protein but we requested four GPUs to have sufficient memory."

          +

          +### pTM scores:

          +The pTM scores are not calculated using the default model. To get pTM scored models you need to change the model names in the input. We have provided a template wrapper script (https://sbgrid.org//wiki/examples/alphafold2) which you can change to your requirements. To get pTM scores you will need to change the model_name line to "model_1_ptm,model_2_ptm" etc.

          +

           ### Examples: 

           We include reference sequences from CASP14 in the installation.

...	...	@@ -7,6 +7,8 @@
7	7	- [Using the default run_alphafold.sh wrapper](#using-the-default-run_alphafoldsh-wrapper)
8	8	- [Creating your own run_alphafold script](#creating-your-own-run_alphafold-script)
9	9	- [Running the python script run_alphafold.py directly](#running-the-python-script-run_alphafoldpy-directly)
	10	+ - [GPU Memory:](#gpu-memory)
	11	+ - [pTM scores:](#ptm-scores)
10	12	- [Examples:](#examples)
11	13	- [Web portal](#web-portal)
12	14	- [Known issues](#known-issues)
...	...	@@ -15,6 +17,8 @@
15	17	### Preparing to run Alphafold
16	18	The ALPHAFOLD2 source an implementation of the inference pipeline of AlphaFold v2.0. using a completely new model that was entered in CASP14. This is not a production application per se, but a reference that is capable of producing structures from a single amino acid sequence.
17	19
	20	+From the developers' original publication: "The provided inference script is optimized for predicting the structure of a single protein, and it will compile the neural network to be specialized to exactly the size of the sequence, MSA, and templates. For large proteins, the compile time is a negligible fraction of the runtime, but it may become more significant for small proteins or if the multi-sequence alignments are already precomputed. In the bulk inference case, it may make sense to use our make_fixed_size function to pad the inputs to a uniform size, thereby reducing the number of compilations required."
	21	+
18	22	The SBGrid installation of Alphafold2 does not require Docker to run, but does require a relatively recent NVidia GPU and updated driver.
19	23
20	24	AlphaFold requires a set of (large) genetic databases that must be downloaded separately. See https://github.com/deepmind/alphafold#genetic-databases for more information.
...	...	@@ -91,13 +95,23 @@ run_alphafold.sh <path to fasta file> <path to an output directory>
91	95
92	96	### Creating your own run_alphafold script
93	97
94		-You can use our run_alphafold script template here to create your own run script.
	98	+You can use our run_alphafold script template here to create your own run script using the SBGrid installation of Alphafold2.
95	99
96	100	[run_alphafold_template.sh](run_alphafold_template.sh)
97	101
98	102	### Running the python script run_alphafold.py directly
99	103	run_alphafold.sh is a convenience wrapper script that shortens the required command arguments to run_alphafold.py. The run_alphafold.py script is also available which requires all parameters to be set explicitly, but provides greater flexibility. Pass --helpshort or --helpfull to see help on flags.
100	104
	105	+### GPU Memory:
	106	+Memory is going to be an issue with larger protein sizes. The original publication suggests some things to try:
	107	+
	108	+"Inferencing large proteins can easily exceed the memory of a single GPU. For a V100 with 16 GB of memory, we can predict the structure of proteins up to ~1,300 residues without ensembling and the 256- and 384-residue inference times are using a single GPU’s memory. "
	109	+
	110	+"The memory usage is approximately quadratic in the number of residues, so a 2,500 residue protein involves using unified memory so that we can greatly exceed the memory of a single V100. In our cloud setup, a single V100 is used for computation on a 2,500 residue protein but we requested four GPUs to have sufficient memory."
	111	+
	112	+### pTM scores:
	113	+The pTM scores are not calculated using the default model. To get pTM scored models you need to change the model names in the input. We have provided a template wrapper script (https://sbgrid.org//wiki/examples/alphafold2) which you can change to your requirements. To get pTM scores you will need to change the model_name line to "model_1_ptm,model_2_ptm" etc.
	114	+
101	115	### Examples:
102	116
103	117	We include reference sequences from CASP14 in the installation.