Making it rain 1 is a front end for running molecular dynamics simulations using cloud-based resources. It makes use of the OpenMM toolkit, Google Colab framework, and Jupyter notebooks.
Before I start running MD simulations in Colab, I will start with some Colab-Jupyter tutorials outlined in the paper by Engelberger et al. (2021) 2. These tutorials are available from the Cloud-based Tutorials on Structural Bioinformatics github repository.
Also see this blog: New paper describes Google Colab notebooks to efficiently run molecular dynamics simulations of proteins
First follow these instructions:
01.00-How-to-Get-Started.ipynb
notebook to your google drive.01.00-How-to-Get-Started-dgo.ipynb
.01.00-How-to-Get-Started-dgo.ipynb
jupyter notebook.PyRosetta installed at 'prefix'... Please click "Runtime → Restart runtime" before using it.
Success!
Success!
if not os.getenv("DEBUG"):
!pip install attrs billiard biopython blosc dask dask-jobqueue distributed GitPython graphviz jupyter matplotlib numpy pandas py3Dmol scipy seaborn traitlets --user
I got the following error:
NameError: name 'os' is not defined
Add the following to the code at the top:
import os
Success!
Now I will try the Cloud-based Tutorials on Structural Bioinformatics.
lab00_software-dgo.ipynb
.if sys.version_info.major != 3 or sys.version_info.minor != 6:
to
if sys.version_info.major != 3 or sys.version_info.minor != 7:
I also needed to change the version downloaded to:
http://graylab.jhu.edu/download/PyRosetta4/archive/release/PyRosetta4.MinSizeRel.python37.linux/PyRosetta4.MinSizeRel.python37.linux.release-300.tar.bz2
Success! File is downloading.
I’m not sure why the instructions say to follow the PyRosetta installation from the PyRosetta notebooks, then install it again, here. It appeared to be already installed in a prefix
directory from my first installation attempt.
I saw a fatal error early in the running of the script in the output cell. However, the streaming was truncated to the last 5000 lines, and I cannot find where the log files (if any) are located. The error had something to do with .pth
files. The installation appeared to succeed–it ended with PyRosetta setup took: 687.3s...
, which is the last line of the script.
I ran the checks for PyRosetta and the scripts ran without problems. There was one user warning:
Import of 'rosetta' as a top-level module is deprecated and may be removed in 2018, import via 'pyrosetta.rosetta'.
This is separate from the ipykernel package so we can avoid doing imports until
It looks like the warning was truncated.
As a result changing
from rosetta import * from pyrosetta import * from rosetta.protocols.rigid import *
to
from pyrosetta.rosetta import * from pyrosetta import * from pyrosetta.rosetta.protocols.rigid import *
In my case:
Consider changing
from pyrosetta import *
from rosetta.protocols.rosetta_scripts import *
from pyrosetta import (
to
from pyrosetta import *
from pyrosetta.rosetta.protocols.rosetta_scripts import *
from pyrosetta import (
Had some problems.
GROMACS set up completed
. . .
GROMACS extraction completed
. . .
-- Downloading... done
-- extracting... done
. . .
Making all in generic-simd256
. . .
Making all in .
. . .
Making all in tools
. . .
Making install in .
. . .
[ 1%] Completed 'fftwBuild'
. . .
src/gromacs/CMakeFiles/libgromacs.dir/build.make:63: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o' failed
CMakeFiles/Makefile2:3038: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/all' failed
Makefile:162: recipe for target 'all' failed
GROMACS building completed
src/gromacs/CMakeFiles/libgromacs.dir/build.make:63: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o' failed
CMakeFiles/Makefile2:3038: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/all' failed
CMakeFiles/Makefile2:1091: recipe for target 'CMakeFiles/check.dir/rule' failed
Makefile:587: recipe for target 'check' failed
GROMACS testing completed
src/gromacs/CMakeFiles/libgromacs.dir/build.make:63: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o' failed
CMakeFiles/Makefile2:3038: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/all' failed
Makefile:162: recipe for target 'all' failed
GROMACS installation completed. Please check if any errors occurred during installation
/content/gromacs-2020.3/build/src/external/build-fftw/fftwBuild-prefix/src/fftwBuild/configure: line 8325: /usr/bin/file: No such file or directory
nvcc fatal : Unsupported gpu architecture 'compute_30'
CMake Error at libgromacs_generated_nbnxm_cuda.cu.o.Release.cmake:219 (message):
Error generating
/content/gromacs-2020.3/build/src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/./libgromacs_generated_nbnxm_cuda.cu.o
make[2]: *** [src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o] Error 1
make[1]: *** [src/gromacs/CMakeFiles/libgromacs.dir/all] Error 2
make: *** [all] Error 2
nvcc fatal : Unsupported gpu architecture 'compute_30'
CMake Error at libgromacs_generated_nbnxm_cuda.cu.o.Release.cmake:219 (message):
Error generating
/content/gromacs-2020.3/build/src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/./libgromacs_generated_nbnxm_cuda.cu.o
make[3]: *** [src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o] Error 1
make[2]: *** [src/gromacs/CMakeFiles/libgromacs.dir/all] Error 2
make[1]: *** [CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2
nvcc fatal : Unsupported gpu architecture 'compute_30'
CMake Error at libgromacs_generated_nbnxm_cuda.cu.o.Release.cmake:219 (message):
Error generating
/content/gromacs-2020.3/build/src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/./libgromacs_generated_nbnxm_cuda.cu.o
make[2]: *** [src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o] Error 1
make[1]: *** [src/gromacs/CMakeFiles/libgromacs.dir/all] Error 2
make: *** [all] Error 2
use GROMACS 2020.4
Restart Runtime (factory reset).
2021-11-01 15:41:58 (16.1 MB/s) - ‘gromacs-2020.4.tar.gz’ saved [29149899/29149899]
Start over from top.
PyRosetta setup took: 5000.8s...
Change
import pandas as pd
from pyrosetta import *
from rosetta.protocols.rosetta_scripts import *
from pyrosetta import (
init, pose_from_sequence, pose_from_file, Pose, MoveMap, create_score_function, get_fa_scorefxn,
MonteCarlo, TrialMover, SwitchResidueTypeSetMover, PyJobDistributor,
)
to
import pandas as pd
from pyrosetta import *
from pyrosetta.rosetta.protocols.rosetta_scripts import *
from pyrosetta import (
init, pose_from_sequence, pose_from_file, Pose, MoveMap, create_score_function, get_fa_scorefxn,
MonteCarlo, TrialMover, SwitchResidueTypeSetMover, PyJobDistributor,
)
Change to
!wget http://ftp.gromacs.org/pub/gromacs/gromacs-2020.4.tar.gz
in the following code, change gromacs-2020.3
to gromacs-2020.4
It appears that code cells that use %%bash
scripts do not show intermediate output. You only get output when the entire script is done.
Success!
GROMACS installation completed. Please check if any errors occurred during installation
The next cell executed successfully. Output of gmx -h
was the gromacs help file.
Success!
Downloading SBM-enhanced GROMACS was successful.
Extracting and Installing it did not show output %%bash
until the end.
SBM-enhanced GROMACS installation completed. Please check if any errors occurred during installation
Installation check: gmx -h
returned the help file. Success!
Backing up:
GROMACS successfully backed up!
Software notebook, lab00_software-dgo.ipynb
, successfully completed!
From this website.
Colab
.Great! Now you have an app on your computer that will launch Colab in a new window.
GitHub
tab, open this link.As soon as you make a copy of the notebook, the copy is opened in the browser, which defeats the purpose of using the Colab shortcut app.
From the Seaborn website:
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
There is a nice Seaborn tutorial presented as a jupyter notebook in Colab: 04.14-Visualization-With-Seaborn.ipynb.
An introduction to Pandas is provided by Google as a colab notebook and a tutorial can be found at Pandas DataFrame UltraQuick Tutorial.ipynb.
There are tutorials available on the Pandas documentation website.
simple tutorial for sharing jupyter notebooks on GitHub
From the nbpages documentation:
nbpages is a command line tool for publishing a collection of Jupyter notebooks to Github Pages. This project was inspired by the tools included with the Python Data Science Handbook by Jake Vanderplas.
nbextensions
The jupyter notebook extensions are a useful set of tools that makes working with jupyter notebooks easier. See 10 Jupyter Notebook Extensions Making My Lyfe Easier for some of them.
conda
is not installed on colab
):!pip install jupyter_contrib_nbextensions
# it appears many of the packages are already installed
!pip install jupyter_nbextensions_configurator
# Requirement already satisfied:
!jupyter contrib nbextension install --user
!jupyter nbextensions_configurator enable --user
Success!
How to save Google Colab Notebooks from runtime timeouts has a keep alive
script that you can get from the sour4bh/stop-cursing-colab repository. It uses AutoHotKey
.
From Stackoverflow.
More Stackoverflow answers, but these are from a few years ago.
Kaggle access free GPUs
Offers real-time collaboration, version control, chat, and other features.
Also has been used for teaching since 2013, so has useful teaching features.
Pricing: $14 per student, and a per month/core/project, etc. subscriptions.
Academic Research Group: $831/yr
Hobbyist: $70/yr, but only 1GB/project
conda
on colab
conda
.!conda --version
/bin/bash: conda: command not found
Conda
is not installed.
pip
command.!pip install -q condacolab
import condacolab
condacolab.install()
I may not need this if I stick with pip
.
Arantes, P.R., Polêto, M.D., Pedebos, C., and Ligabue-Braun, R. (2021). Making it Rain: Cloud-Based Molecular Simulations for Everyone. J Chem Inf Model 61: 4852–4856. doi: 10.1021/acs.jcim.1c00998. ↩︎
Engelberger, F., Galaz-Davison, P., Bravo, G., Rivera, M., and Ramírez-Sarmiento, C.A. (2021). Developing and Implementing Cloud-Based Tutorials That Combine Bioinformatics Software, Interactive Coding, and Visualization Exercises for Distance Learning on Structural Bioinformatics. doi: 10.1021/acs.jchemed.1c00022 ↩︎