Making It Rain

Making it rain 1 is a front end for running molecular dynamics simulations using cloud-based resources. It makes use of the OpenMM toolkit, Google Colab framework, and Jupyter notebooks.

Before I start running MD simulations in Colab, I will start with some Colab-Jupyter tutorials outlined in the paper by Engelberger et al. (2021) 2. These tutorials are available from the Cloud-based Tutorials on Structural Bioinformatics github repository.

Also see this blog: New paper describes Google Colab notebooks to efficiently run molecular dynamics simulations of proteins

PyRosetta notebooks

PyRosetta Google Drive Setup

First follow these instructions:

  • Visit the RosettaCommons/PyRosetta.notebooks github page.
  • Click on Chapter 1: How to get started.
  • Change to the Google work account
  • (Step 1) Get the PyRosetta License.
  • Mount Google Drive.
  • (Step 2) Copy the 01.00-How-to-Get-Started.ipynb notebook to your google drive.
  • Rename the copy to 01.00-How-to-Get-Started-dgo.ipynb.
  • (Step 3) Run the first two cells in the 01.00-How-to-Get-Started-dgo.ipynb jupyter notebook.
PyRosetta installed at 'prefix'... Please click "Runtime → Restart runtime" before using it.

Success!

  • Run the code in cell 2.

Success!

Install extensions

  • (Step 4) Run the cell with the following code:
if not os.getenv("DEBUG"):
    !pip install attrs billiard biopython blosc dask dask-jobqueue distributed GitPython graphviz jupyter matplotlib numpy pandas py3Dmol scipy seaborn traitlets --user

I got the following error:

NameError: name 'os' is not defined

Add the following to the code at the top:

import os

Success!

Now I will try the Cloud-based Tutorials on Structural Bioinformatics.


p3lab (IBM3202)tutorials

Lab.00 Software

if sys.version_info.major != 3 or sys.version_info.minor != 6:

to

if sys.version_info.major != 3 or sys.version_info.minor != 7:

I also needed to change the version downloaded to:

http://graylab.jhu.edu/download/PyRosetta4/archive/release/PyRosetta4.MinSizeRel.python37.linux/PyRosetta4.MinSizeRel.python37.linux.release-300.tar.bz2   

Success! File is downloading.

I’m not sure why the instructions say to follow the PyRosetta installation from the PyRosetta notebooks, then install it again, here. It appeared to be already installed in a prefix directory from my first installation attempt.

I saw a fatal error early in the running of the script in the output cell. However, the streaming was truncated to the last 5000 lines, and I cannot find where the log files (if any) are located. The error had something to do with .pth files. The installation appeared to succeed–it ended with PyRosetta setup took: 687.3s..., which is the last line of the script.

I ran the checks for PyRosetta and the scripts ran without problems. There was one user warning:

Import of 'rosetta' as a top-level module is deprecated and may be removed in 2018, import via 'pyrosetta.rosetta'.
  This is separate from the ipykernel package so we can avoid doing imports until

It looks like the warning was truncated.

From this site

As a result changing

from rosetta import *
from pyrosetta import *
from rosetta.protocols.rigid import *

to

from pyrosetta.rosetta import *
from pyrosetta import *
from pyrosetta.rosetta.protocols.rigid import *

In my case:

Consider changing

from pyrosetta import *
from rosetta.protocols.rosetta_scripts import *
from pyrosetta import (

to

from pyrosetta import *
from pyrosetta.rosetta.protocols.rosetta_scripts import *
from pyrosetta import (

Installation of GROMACS

  • Follow the instructions in the notebook.

Had some problems.

GROMACS set up completed
. . . 
GROMACS extraction completed
. . . 
-- Downloading... done
-- extracting... done
. . .
Making all in generic-simd256
. . . 
Making all in .
. . .
Making all in tools
. . .
Making install in .
. . .
[  1%] Completed 'fftwBuild'
. . .
src/gromacs/CMakeFiles/libgromacs.dir/build.make:63: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o' failed
CMakeFiles/Makefile2:3038: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/all' failed
Makefile:162: recipe for target 'all' failed
GROMACS building completed
src/gromacs/CMakeFiles/libgromacs.dir/build.make:63: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o' failed
CMakeFiles/Makefile2:3038: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/all' failed
CMakeFiles/Makefile2:1091: recipe for target 'CMakeFiles/check.dir/rule' failed
Makefile:587: recipe for target 'check' failed
GROMACS testing completed
src/gromacs/CMakeFiles/libgromacs.dir/build.make:63: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o' failed
CMakeFiles/Makefile2:3038: recipe for target 'src/gromacs/CMakeFiles/libgromacs.dir/all' failed
Makefile:162: recipe for target 'all' failed
GROMACS installation completed. Please check if any errors occurred during installation
/content/gromacs-2020.3/build/src/external/build-fftw/fftwBuild-prefix/src/fftwBuild/configure: line 8325: /usr/bin/file: No such file or directory
nvcc fatal   : Unsupported gpu architecture 'compute_30'

CMake Error at libgromacs_generated_nbnxm_cuda.cu.o.Release.cmake:219 (message):
  Error generating
  /content/gromacs-2020.3/build/src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/./libgromacs_generated_nbnxm_cuda.cu.o


make[2]: *** [src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o] Error 1
make[1]: *** [src/gromacs/CMakeFiles/libgromacs.dir/all] Error 2
make: *** [all] Error 2
nvcc fatal   : Unsupported gpu architecture 'compute_30'
CMake Error at libgromacs_generated_nbnxm_cuda.cu.o.Release.cmake:219 (message):
  Error generating
  /content/gromacs-2020.3/build/src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/./libgromacs_generated_nbnxm_cuda.cu.o

make[3]: *** [src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o] Error 1
make[2]: *** [src/gromacs/CMakeFiles/libgromacs.dir/all] Error 2
make[1]: *** [CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2
nvcc fatal   : Unsupported gpu architecture 'compute_30'
CMake Error at libgromacs_generated_nbnxm_cuda.cu.o.Release.cmake:219 (message):
  Error generating
  /content/gromacs-2020.3/build/src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/./libgromacs_generated_nbnxm_cuda.cu.o


make[2]: *** [src/gromacs/CMakeFiles/libgromacs.dir/nbnxm/cuda/libgromacs_generated_nbnxm_cuda.cu.o] Error 1
make[1]: *** [src/gromacs/CMakeFiles/libgromacs.dir/all] Error 2
make: *** [all] Error 2

From this Stackoverflow issue

use GROMACS 2020.4

Restart Runtime (factory reset).

  • Reinstall GROMACS using the 2020.4 version.
2021-11-01 15:41:58 (16.1 MB/s) - ‘gromacs-2020.4.tar.gz’ saved [29149899/29149899]

Start over from top.

PyRosetta setup took: 5000.8s...

Change

import pandas as pd
from pyrosetta import *
from rosetta.protocols.rosetta_scripts import *
from pyrosetta import (
    init, pose_from_sequence, pose_from_file, Pose, MoveMap, create_score_function, get_fa_scorefxn,
    MonteCarlo, TrialMover, SwitchResidueTypeSetMover, PyJobDistributor,
)

to

import pandas as pd
from pyrosetta import *
from pyrosetta.rosetta.protocols.rosetta_scripts import *
from pyrosetta import (
    init, pose_from_sequence, pose_from_file, Pose, MoveMap, create_score_function, get_fa_scorefxn,
    MonteCarlo, TrialMover, SwitchResidueTypeSetMover, PyJobDistributor,
)

Installing GROMACS

Change to

!wget http://ftp.gromacs.org/pub/gromacs/gromacs-2020.4.tar.gz

in the following code, change gromacs-2020.3 to gromacs-2020.4

It appears that code cells that use %%bash scripts do not show intermediate output. You only get output when the entire script is done.

Success!

GROMACS installation completed. Please check if any errors occurred during installation

The next cell executed successfully. Output of gmx -h was the gromacs help file.

Backup GROMACS to Google Drive

Success!

Installation of SBM-enhanced GROMACS

Downloading SBM-enhanced GROMACS was successful.

Extracting and Installing it did not show output %%bash until the end.

SBM-enhanced GROMACS installation completed. Please check if any errors occurred during installation

Installation check: gmx -h returned the help file. Success!

Backing up:

GROMACS successfully backed up!

Software notebook, lab00_software-dgo.ipynb, successfully completed!

Colab tips

From this website.

  • Go to colab
  • Change to the Google work account
  • Click on the 3 dots in the upper right corner of your Chrome browswer.
  • Go to More Tools -> Create Shortcut…
  • Give the shortcut a name such as Colab.
  • Click the box, Open as window.
  • Click Create.

Great! Now you have an app on your computer that will launch Colab in a new window.

Lab.01 warmup

  • Open the Colab app.
  • In the GitHub tab, open this link.
  • Open the tutorials/lab01_intro.ipynb notebook.
  • Save a copy to Drive, and rename it with the -dgo suffix.

As soon as you make a copy of the notebook, the copy is opened in the browser, which defeats the purpose of using the Colab shortcut app.

  • Continue with tutorial.

Resources for data visualization

Seaborn

From the Seaborn website:

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

There is a nice Seaborn tutorial presented as a jupyter notebook in Colab: 04.14-Visualization-With-Seaborn.ipynb.

Pandas

Pandas

An introduction to Pandas is provided by Google as a colab notebook and a tutorial can be found at Pandas DataFrame UltraQuick Tutorial.ipynb.

There are tutorials available on the Pandas documentation website.

Data science tutorials

Data Science 101: Build your first Machine Learning Model with Pandas, Scikit-Learn, and Google Colab

simple tutorial for sharing jupyter notebooks on GitHub

  1. git init
  2. change master to main
  3. git add README.md
  4. git add GeeksForGeeks.ipynb
  5. git commit -m “notebook first commit”
  6. git remote add origin https://github.com/{Your repo}/GeeksForGeeks.git
  7. git push -u origin master

Organizing notebooks

extensions for code review

nbpages

From the nbpages documentation:

nbpages is a command line tool for publishing a collection of Jupyter notebooks to Github Pages. This project was inspired by the tools included with the Python Data Science Handbook by Jake Vanderplas.

Intalling nbextensions

The jupyter notebook extensions are a useful set of tools that makes working with jupyter notebooks easier. See 10 Jupyter Notebook Extensions Making My Lyfe Easier for some of them.

  • Try these commands (note: conda is not installed on colab):
!pip install jupyter_contrib_nbextensions
# it appears many of the packages are already installed
!pip install jupyter_nbextensions_configurator
# Requirement already satisfied:
!jupyter contrib nbextension install --user 
!jupyter nbextensions_configurator enable --user

Installation of PyRosetta

  • Get the PyRosetta License.
  • Mount Google Drive.

Success!

Avoiding colab timeouts

How to save Google Colab Notebooks from runtime timeouts has a keep alive script that you can get from the sour4bh/stop-cursing-colab repository. It uses AutoHotKey.


Adding CSS to colab notebooks

From Stackoverflow.

More Stackoverflow answers, but these are from a few years ago.

This might work

Other online jupyter notebook environments

Kaggle access free GPUs

CoCalc

Offers real-time collaboration, version control, chat, and other features.
Also has been used for teaching since 2013, so has useful teaching features.
Pricing: $14 per student, and a per month/core/project, etc. subscriptions.
Academic Research Group: $831/yr
Hobbyist: $70/yr, but only 1GB/project

Installing conda on colab

!conda --version
/bin/bash: conda: command not found

Conda is not installed.

  • Install it with the pip command.
!pip install -q condacolab
import condacolab
condacolab.install()

I may not need this if I stick with pip.




  1. Arantes, P.R., Polêto, M.D., Pedebos, C., and Ligabue-Braun, R. (2021). Making it Rain: Cloud-Based Molecular Simulations for Everyone. J Chem Inf Model 61: 4852–4856. doi: 10.1021/acs.jcim.1c00998. ↩︎

  2. Engelberger, F., Galaz-Davison, P., Bravo, G., Rivera, M., and Ramírez-Sarmiento, C.A. (2021). Developing and Implementing Cloud-Based Tutorials That Combine Bioinformatics Software, Interactive Coding, and Visualization Exercises for Distance Learning on Structural Bioinformatics. doi: 10.1021/acs.jchemed.1c00022 ↩︎

DGO, 29 October 2021