Lab 3: Simulation of water¶
Learning goals:
To learn how MD simulations work in practise
To learn the basic protocols applied in MD simulations
To understand the difference between energy minimization and simulation
To be able to do the most basic analysis and plot the results
To be able to trace errors and determine how to fix them
Keywords: Particle simulations, molecular dynamics, Monte Carlo, continuum simulations, molecular topology, force field, interactions
Note:
The links are provided for additional information for those interested, not as required reading.
Plotting with
xmgrace
is the easiest and most straightforward way. You may also plot using Jupyter Lab (that’s how the sample plots were done).
The tasks consist of the following:
Preliminaries: Checking that Gromacs[2] has been installed
Preliminaries: Checking that the systems finds Gromacs
System Setup
Energy minimization
Pre-equilibration
Simulation
Data analysis
All parts involve quick visualizations, error checking and visualizations.
Preliminary: Create yourself a working directory¶
Create a new directory under
CHEM3300G
. Give the directory a descriptive name. For example,lab_water_1
or something like.Keep track of the commands and processes. You should have an electronic lab book with notes.
Preliminary: Test that the tools are in place¶
Test if your OS can find Gromacs [5][2]:
which gmx
If yes, then everything is ok.
If not, then you didn’t complete the installation procedure. At the end of installation, the system had to be told where to find Gromacs. Go back to the installation guide and complete the final steps.
Once you have verified that Gromacs (the command gmx
) is found by your system, try also the following command:
echo $GMXDATA
If this returns an empty output, then you have not set up Gromacs properly and you should go back to the installation procedure, to part where you have to let the system to know where Gromacs is.
The output should something like this depending where Gromacs is installed:
/home/sam/CHEM3300G/gmx/share/gromacs
If the command worked, you are ready to proceed.
Preliminary: How to get help in Gromacs¶
You can get more information about any command using gmx help <command>
. For example, help on Gromacs’ energy calculation:
gmx help energy
The same pattern works for all Gromacs commands. The first part, molecular definitions, is easy since we are simulating water and the model (as well as many other models) already exist and we will be able simply use the model of choice (SPC water [3]).
The other steps have several steps as will be seen.
Important: Always the on-line manual and the help pages if you end up to problems. And if/when you end up to problems, always save the error messages. They tell, or at least give a clue of, what went wrong.
Preliminary: MD workflow¶
The figure below (from lectures) is a reminder of the general MD workflow.
Background: Simulation of water using Gromacs¶
Background: Gromacs distribution provides 3 basic water models¶
Gromacs provides 3 basic water models. These models are for
The corresponding .gro
files in the directory $GMXDATA/top
and they can be found easily by command
ls -l $GMXDATA/top/*.gro
The output contains lines such as
spc216.gro
tip4p.gro
tip5p.gro
The figure below provides a qualitative explanation of the different models.
Background: Notes¶
Gromacs provides several 3-point models: SPC, SPC/E and TIP3P. For all 3-point models the file
spc216.gro
(located in$GMXDATA/gromacs/top
) can be used. There are no separatetip3p.gro
orspce.gro
and other files. The filespc216.gro
contains 216 pre-equilibrated SPC (3-point) molecules.Gromacs provides one 4-point model (in that same location as the 3-point model),
tip4p.gro
. The filetip4p.gro
has 216 pre-equilibrated (at 300 K) TIP4P water molecules.Gromacs also provides one 5-point model (again at the same location as the two above):
tip5p.gro
. Tiletip5p.gro
has 512 pre-equilibrated (at 300 K) TIP5P water molecules.You can, of course, always build your system starting from a single H\(_2\)O and clone that.
The directory
$GMXDATA/gromacs/top
has also force field directories, such asgromos54a7.ff
within it. Each of these force field directories have their own respective
.itp
files for the water models to be used with them. In addition, each of these directories has a filewatermodels.dat
which contains some info regarding the force field and the recommended water models. Not all water models can be used with all force fields.
Background: Water models not provided with the Gromacs distribution¶
There are lots of other water models. If you wish use them, you just have to build the topologies, pay attention to force fields and compatibility with possible solutes and their force fields.
Coarse-grained models (such as the Martini model [10]): Must be obtained/built separately.
Background: Notes on setting up the simulation box¶
There are several possible ways to set up a simulation for a water box, we will follow the setup procedure below as it is very general and useful. This is what we will do:
Insert (
gmx editconf
) the pre-equilibrated 216 water molecules from thespc216.gro
centered in a simulation box of chosen geometry (we will use cubic). After setting up the initial system and defining the size (gmx editoconf
),the simulation box is then filled with more water (
gmx solvate
).
The steps above constitute the system setup process. There are a few things worth repeating before we start:
Use descriptive file names. That is particularly useful if you need to trace back your steps, start new simulations using the same structures, and in finding errors. This is the approach we will follow.
Pay attention to the error messages and other messages the software produces.
Keep also in mind that at almost each step, a new
.gro
file is created. Using unique and descriptive file names helps to keep track of the.gro
files created at various stages.
Background: Homework task¶
What’s the deal with 216 SPC molecule file (spc216.gro
) and the box vectors?
Figure it out the answers by finding the solutions to these questions.
Normal density of water is 1 g/cm\(^3\)
We have 216 water molecules
Mass of a water molecule is:
<your answer here>
216 water molecules weigh:
<your answer here>
The last line says:
<your answer here>
This gives a box of volume:
<your answer here>
Step 1: Generate simulation box¶
Now we can start the procedure. The first step is to generate a simulation box with a desired geometry and choose the boundary conditions (we’ll use periodic boundary conditions or PBC).
See also
gmx editconf
in Gromacs manualNote 1: Remember the command
source
.Note 2: Below: when necessary, replace username by your own username
Note 3: Unless otherwise said, the commands must be on one single line
First, we generate a cubic box with side of length 4 nm with the system of 216 waters centered inside it using the command below. Important: this must be on one continuous line so be careful if you are copy-pasting:
gmx editconf -f ${GMXDATA}/top/spc216.gro -o spc216-cubic-4.gro -bt cubic -box 4
Command details:
-f # input file. Typically .gro, .pdb or .tpr. We use .gro
-o # output file. Typically .gro
-bt # box type.
-box # Vector length(s) [box size]. For cubic and dodecahedral boxes only one value
# is needed. Centers the system (no -c needed). Gromacs uses nanometers.
Step 1: Homework task¶
Pay attention to the output and find the answers to the following questions.
What is the size of the box that was generated?
How do you know that the box is cubic?
Sample editoconf output
GROMACS: gmx editconf, version 2020.4
Executable: /data/TEACHING/CHEM3300G/gmx/bin/gmx
Data prefix: /data/TEACHING/CHEM3300G/gmx
Working dir: /data/TEACHING/CHEM3300G/water_1/Lab
Command line:
gmx editconf -f /data/TEACHING/CHEM3300G/gmx/share/gromacs/top/spc216.gro -o spc216-cubic-4.gro -bt cubic -box 4
Note that major changes are planned in future for editconf, to improve usability and utility.
Read 648 atoms
Volume: 6.45626 nm^3, corresponds to roughly 2900 electrons
No velocities found
system size : 1.968 1.997 1.969 (nm)
diameter : 3.059 (nm)
center : 0.009 -0.001 -0.009 (nm)
box vectors : 1.862 1.862 1.862 (nm)
box angles : 90.00 90.00 90.00 (degrees)
box volume : 6.46 (nm^3)
shift : 1.991 2.001 2.009 (nm)
new center : 2.000 2.000 2.000 (nm)
new box vectors : 4.000 4.000 4.000 (nm)
new box angles : 90.00 90.00 90.00 (degrees)
new box volume : 64.00 (nm^3)
GROMACS reminds you: "Stay Cool, This is a Robbery" (Pulp Fiction)
Step 1: Notes¶
The box size of 4 nm will give roughly 1,940 water molecules (as you will in the next step). This is ok even with older computers. You can also change it to a smaller number to speed up computations later on, but it should be larger than 2.8. If you change the box size, it may be a good idea to change the name so that it describes what the file contains.
What is happening above: you must have the input file spc216.gro
. The 216 water molecules are put inside a new cubic simulation of size 4 nm. You should check both the original spc216.gro
and the new file spc216-cubic-4.gro
using VMD. The file spc216.gro
is the only input file here. A new file is generated as specified by the option -o
(stands for output). This new file is called spc216-cubic-4.gro
. The number after -box
determines the vector length (remember that Gromacs’ units are in nanometers), that is, -box 4
generates a cubic box with a size of 4 nm.
Step 1: What can go wrong¶
Gromacs didn’t find
spc216.gro
. One possibility is then to copy it from$GMXDATA/top
to your current work directory or refer to it directly (that’s what we are doing above).The above commands must be on a single line so be careful when you copy-paste. The same applies to the rest.
Step 2. The simulation box needs to be filled with water.¶
gmx solvate
in Gromacs manual
In the previous step, we generated a box of linear size 4 nm and put 216 SPC [3] water molecules in the middle of it. The next command will add more water molecules to fill the box:
gmx solvate -cp spc216-cubic-4.gro -cs spc216.gro -o spc216-solvated-cubic-4.gro
Step 2: Homework task¶
Pay attention to the output and write down (copy-paste) the answer to the following questions. You will need them in the next assignment unless you want to perform this task again.
What is the precise number water molecules?
How many atoms in total?
Were there any overlaps? If so, how many?
What is the exact density of the system?
What are the exact dimensions of the box (hint: you may want to check the new
gro
file)?
Output from solvate
Output looks something like this:
GROMACS: gmx solvate, version 2020.4
Executable: /data/TEACHING/CHEM3300G/gmx/bin/gmx
Data prefix: /data/TEACHING/CHEM3300G/gmx
Working dir: /data/TEACHING/CHEM3300G/water_1/Lab
Command line:
gmx solvate -cp spc216-cubic-4.gro -cs /data/TEACHING/CHEM3300G/gmx/share/gromacs/top/spc216.gro -o spc216-cubic-solvated-4.gro
Reading solute configuration
Reading solvent configuration
Initialising inter-atomic distances...
WARNING: Masses and atomic (Van der Waals) radii will be guessed
based on residue and atom names, since they could not be
definitively assigned from the information in your input
files. These guessed numbers might deviate from the mass
and radius of the atom type. Please check the output
files if necessary.
NOTE: From version 5.0 gmx solvate uses the Van der Waals radii
from the source below. This means the results may be different
compared to previous GROMACS versions.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
A. Bondi
van der Waals Volumes and Radii
J. Phys. Chem. 68 (1964) pp. 441-451
-------- -------- --- Thank You --- -------- --------
Generating solvent configuration
Will generate new solvent configuration of 3x3x3 boxes
Solvent box contains 7998 atoms in 2666 residues
Removed 1503 solvent atoms due to solvent-solvent overlap
Removed 657 solvent atoms due to solute-solvent overlap
Sorting configuration
Found 1 molecule type:
SOL ( 3 atoms): 1946 residues
Generated solvent containing 5838 atoms in 1946 residues
Writing generated configuration to spc216-cubic-solvated-4.gro
Output configuration contains 6486 atoms in 2162 residues
Volume : 64 (nm^3)
Density : 1010.58 (g/l)
Number of solvent molecules: 1946
GROMACS reminds you: "Life need not be easy, provided only that it is not empty." (Lise Meitner)
The exact number of molecules will most likely be slightly different for you (and for everyone for that matter). Why? Because the process of solvation involves random numbers. You should check, however, that the density is reasonable.
Command details:
-cp # input file. This contains the solute.
# box size is included in this (we defined this with editconf above).
-cs # input file. This contains the solvent.
-o # output file. The part noff just tells that no force field has been incorporated yet.
After completing Step 2, we have the initial structures ready. We have not, however, associated force fields with them or any other parameters for that matter – if you have used Gromacs[2] before to, say, solvate a protein, you notice that we did not specify a topology file above. This is what we will do next.
You should open the file spc216-solvated-cubic-4.gro
in VMD and see how the system looks like.
Although the small 216 water molecule system was equilibrated, these larger new systems are not. We used the small 216 molecule system since it was pre-equilibrated with a density close to real water density. By multiplying that system, we obtained a new larger one that has density close to what the density of water should be. The new system is, however, not in equilibrium.
Step 3. Generate the topology and index files¶
The above procedure defined the initial coordinates but not the force fields and topologies. This is done with the command gmx pdb2gmx
.
When executed, pdb2gmx
asks for a force field and the specific water model
So far we have only defined the water model to be a 3-site model instead of 4- or 5-site model but we have not defined if the model is SPC [3], SPC/E [3] or TIP3P [6]. These models are all 3-site models and we have to choose one of them. In our case, we will choose SPC (but could have equally well chosen one of the others).
Topology: defines the molecular connections. In Gromacs [2], there are three types of topology files: 1) .itp
for individual molecules. The file gives the connectivity information for the different atoms, atomic charges, angles and so on. 2) The files having the suffix .top
are system topology files. They contain topology description for the simulation system. 3) The final one is residue topology. These files have the suffix .rtp
. They are for proteins and are used by the Gromacs routine pdbgmx
.
Here, we use .itp
files implicitly (they are already included in the force field description and we don’t have to do anything about them). We generate a system topology (.top
) file. Since we don’t have any proteins, .rtp
files are not used here.
Note: The command gmx pdb2gmx
below produces 3 files: topology, position restraints and .gro
(or other specified).
Generate a cubic simulation box:
gmx pdb2gmx -f spc216-solvated-cubic-4.gro -o spc-oplsaa-solvated-cubic.gro -p spc-oplsaa-solvated-cubic.top -n spc-oplsaa-solvated-cubic.ndx
When prompted, select OPLS-AA (Optimized Parameters for Liquid Simulations) [7] for the force field and SPC (Simple Point Charge) [4] for the water model. The output files above have been named with that assumption. Change the names if you change the force field / water model.
Pay attention to the messages if there are any errors or warnings.
Again, select OPLS-AA [7] for the force field and SPC [4] for the water model.
Command details:
-f # input file.
-o # output file.
-p # output file. Topology.
-n # output file. Contains reordered groups.
The dropdowns
below show sample outputs during the process.
Selection of force field
GROMACS: gmx pdb2gmx, version 2020.4
Executable: /data/TEACHING/CHEM3300G/gmx/bin/gmx
Data prefix: /data/TEACHING/CHEM3300G/gmx
Working dir: /data/TEACHING/CHEM3300G/water_1/Lab
Command line:
gmx pdb2gmx -f spc216-cubic-solvated-4.gro -o spc-oplsaa-solvated-cubic.gro -p spc-oplsaa-solvated-cubic.top -n spc-oplsaa-solvated-cubic.ndx
Select the Force Field:
From '/data/TEACHING/CHEM3300G/gmx/share/gromacs/top':
1: AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)
2: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)
3: AMBER96 protein, nucleic AMBER94 (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)
4: AMBER99 protein, nucleic AMBER94 (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)
5: AMBER99SB protein, nucleic AMBER94 (Hornak et al., Proteins 65, 712-725, 2006)
6: AMBER99SB-ILDN protein, nucleic AMBER94 (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010)
7: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)
8: CHARMM27 all-atom force field (CHARM22 plus CMAP for proteins)
9: GROMOS96 43a1 force field
10: GROMOS96 43a2 force field (improved alkane dihedrals)
11: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205)
12: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)
13: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)
14: GROMOS96 54a7 force field (Eur. Biophys. J. (2011), 40,, 843-856, DOI: 10.1007/s00249-011-0700-9)
15: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)
Selection of water model
Using the Oplsaa force field in directory oplsaa.ff
Opening force field file /data/TEACHING/CHEM3300G/gmx/share/gromacs/top/oplsaa.ff/watermodels.dat
Select the Water Model:
1: TIP4P TIP 4-point, recommended
2: TIP4PEW TIP 4-point with Ewald
3: TIP3P TIP 3-point
4: TIP5P TIP 5-point (see http://redmine.gromacs.org/issues/1348 for issues)
5: TIP5P TIP 5-point improved for Ewald sums
6: SPC simple point charge
7: SPC/E extended simple point charge
8: None
Output after selection of water model
6
going to rename oplsaa.ff/aminoacids.r2b
Opening force field file /data/TEACHING/CHEM3300G/gmx/share/gromacs/top/oplsaa.ff/aminoacids.r2b
Reading spc216-cubic-solvated-4.gro...
Read '216H2O,WATJP01,SPC216,SPC-MODEL,300K,BOX(M)=1.86206NM,WFVG,MAR. 1984', 6486 atoms
Analyzing pdb file
Splitting chemical chains based on TER records or chain id changing.
There are 0 chains and 1 blocks of water and 2162 residues with 6486 atoms
chain #res #atoms
1 ' ' 2162 6486 (only water)
No occupancies in spc216-cubic-solvated-4.gro
Opening force field file /data/TEACHING/CHEM3300G/gmx/share/gromacs/top/oplsaa.ff/atomtypes.atp
Reading residue database... (Oplsaa)
Opening force field file /data/TEACHING/CHEM3300G/gmx/share/gromacs/top/oplsaa.ff/aminoacids.rtp
Opening force field file /data/TEACHING/CHEM3300G/gmx/share/gromacs/top/oplsaa.ff/aminoacids.hdb
Opening force field file /data/TEACHING/CHEM3300G/gmx/share/gromacs/top/oplsaa.ff/aminoacids.n.tdb
Opening force field file /data/TEACHING/CHEM3300G/gmx/share/gromacs/top/oplsaa.ff/aminoacids.c.tdb
Processing chain 1 (6486 atoms, 2162 residues)
Problem with chain definition, or missing terminal residues.
This chain does not appear to contain a recognized chain molecule.
If this is incorrect, you can edit residuetypes.dat to modify the behavior.
8 out of 8 lines of specbond.dat converted successfully
Checking for duplicate atoms....
Generating any missing hydrogen atoms and/or adding termini.
Now there are 2162 residues with 6486 atoms
Making bonds...
Number of bonds was 4324, now 4324
Generating angles, dihedrals and pairs...
Making cmap torsions...
There are 0 dihedrals, 0 impropers, 2162 angles
0 pairs, 4324 bonds and 0 virtual sites
Total mass 38949.295 a.m.u.
Total charge 0.000 e
Writing coordinate file...
--------- PLEASE NOTE ------------
You have successfully generated a topology from: spc216-cubic-solvated-4.gro.
The Oplsaa force field and the spc water model are used.
--------- ETON ESAELP ------------
GROMACS reminds you: "There's an old saying among scientific guys: “You can't make an omelet without breaking eggs, ideally by dropping a cement truck on them from a crane." (Dave Barry)
Step 3: Homework task¶
The title mentions index file. Find the Gromacs manual from the web and explain in your own words hat an index file is and for what it is needed.
Step 4. Run the grompp pre-processor.¶
This generates the run file for relaxation
gmx grompp
in Gromacs manual
This is the first time we use grompp
and we do need a .mdp
file. The suffix mdp
comes from Molecular Dynamics Parameters. As the name suggests, that file contains the various parameters for the simulation.
This .mdp
file must be created before executing the commands below. The .mdp
file contains all the runtime parameters and protocols. However, for the minimization step, very little is needed.
To do: The .mdp
file for energy minimization can be as simple as the one below in the dropdown. It must be plain ASCII. Generate this file now – otherwise the command below will not work. Name the file em.mdp
:
MDP file for energy minimization
The .mdp
file for energy minimization can be as simple as the one below and it must be plain ASCII. Generate this file now – otherwise the command below will not work.
Name the file em.mdp
.
;=======================================================================
; FILENAME: em.mdp
;-----------------------------------------------------------------------
; .mdp file for energy minimization
;=======================================================================
;
; Note: emtol and emstep may limit convergence. They
; can be commented out. In that case the criterion is set
; by nsteps and machine precision whichever is met first
;
integrator = steep ; steepest descents, cg for conjugate grad.
; emtol = 1000.0 ; Stop when max force < 1000.0 kJ/mol/nm
; emstep = 0.01 ; Energy step size
nsteps = 1000 ; max # of steps unless process converges 1st
;
; Interactions and neighbours:
;------------------------------
;
nstlist = 1 ; doesn't matter with verlet (next)
cutoff-scheme = verlet ; works for GPU too.
vdw-type = cut-off ; how to handle van der Waals interactions
rvdw = 1.0 ; van der Waals cutoff
coulombtype = pme ; safe and good. Avoid cutoffs.
rcoulomb = 1.0 ; cutoff for real-space part of PME
pbc = xyz ; periodic boundary conditions
;=======================================================================
This is an energy minimization step (not a true simulation) and since we are using deterministic optimization, some of the above parameters are not critical. It is, however, good to set the correctly from the beginning (pme, etc.).
Why do we need to do this? Inserting molecules in the system is more or less a random process. There may be steric problems (=partial overlaps or too much empty space between the molecules). Conformations may be energetically very unfavourable and so on. Thus, the structure needs to be relaxed. It is important to keep in mind that energy minimization is not an actual simulations. It is simply a process in which the energy is minimized based on the positions of the molecules, time is not involved and Newton’s equations of motion are not solved. The two most common minimization methods are the method of steepest descents and the conjugate gradient method. We will not go into detail here, but they are so-called deterministic optimization methods.
Notice: The same
.mdp
file is good for both of the cubic and dodecahedral cases (as well as others).Important: The
grommp
command below combines information fromem.mdp
and the solvated system we created above and generates a run input file (.tpr
)Notice: When you execute the commands below, pay attention to possible errors / warnings.
gmx grompp -v -f em.mdp -c spc-oplsaa-solvated-cubic.gro -p spc-oplsaa-solvated-cubic.top -o spc-oplsaa-solvated-cubic-minimize
This generates a run input file named spc-oplsaa-solvated-cubic-minimize.tpr
. The suffix .tpr
denotes a run input file and as the name suggests, that file is used for running a simulation or a minimization procedure. The latter will be done in the next step using this file.
Command details:
-v # verbose mode. Show all the messages.
-f # input: the .mdp file.
-c # input file. Structure as generated above.
-o # output file. The run input file of .tpr type.
Output from grompp for energy minimization
GROMACS: gmx grompp, version 2020.4
Executable: /data/TEACHING/CHEM3300G/gmx/bin/gmx
Data prefix: /data/TEACHING/CHEM3300G/gmx
Working dir: /data/TEACHING/CHEM3300G/water_1/Lab
Command line:
gmx grompp -v -f em.mdp -c spc-oplsaa-solvated-cubic.gro -p spc-oplsaa-solvated-cubic.top -o spc-oplsaa-solvated-cubic-minimize
checking input for internal consistency...
NOTE 1 [file em.mdp]:
With Verlet lists the optimal nstlist is >= 10, with GPUs >= 20. Note
that with the Verlet scheme, nstlist has no effect on the accuracy of
your simulation.
Setting the LD random seed to -1426617453
processing topology...
Generated 330891 of the 330891 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 330891 of the 330891 1-4 parameter combinations
Excluding 2 bonded neighbours molecule type 'SOL'
processing coordinates...
double-checking input for internal consistency...
renumbering atomtypes...
converting bonded parameters...
initialising group options...
processing index file...
Analysing residue names:
There are: 2162 Water residues
Making dummy/rest group for T-Coupling containing 6486 elements
Making dummy/rest group for Acceleration containing 6486 elements
Making dummy/rest group for Freeze containing 6486 elements
Making dummy/rest group for Energy Mon. containing 6486 elements
Making dummy/rest group for VCM containing 6486 elements
Number of degrees of freedom in T-Coupling group rest is 12969.00
Making dummy/rest group for User1 containing 6486 elements
Making dummy/rest group for User2 containing 6486 elements
Making dummy/rest group for Compressed X containing 6486 elements
Making dummy/rest group for Or. Res. Fit containing 6486 elements
Making dummy/rest group for QMMM containing 6486 elements
T-Coupling has 1 element(s): rest
Energy Mon. has 1 element(s): rest
Acceleration has 1 element(s): rest
Freeze has 1 element(s): rest
User1 has 1 element(s): rest
User2 has 1 element(s): rest
VCM has 1 element(s): rest
Compressed X has 1 element(s): rest
Or. Res. Fit has 1 element(s): rest
QMMM has 1 element(s): rest
Calculating fourier grid dimensions for X Y Z
Using a fourier grid of 36x36x36, spacing 0.111 0.111 0.111
Estimate for the relative computational load of the PME mesh part: 0.26
This run will generate roughly 1 Mb of data
writing run input file...
There was 1 note
GROMACS reminds you: "Don't Follow Me Home" (Throwing Muses)
Step 4: What can go wrong¶
This should be a straightforward step: Information is just collected into the run input file (
.tpr
) to be used in the next step.
Step 4: Homework task¶
Go to the Gromacs manual and read what the parameters
integrator
,nsteps
andpbc
are. Explain (in your own words) their significance.
Step 5: Relaxation using energy minimization (steepest descents or conjugate gradient)¶
gmx mdrun
in Gromacs manual
Now that we have our run input files (.tpr
), we can perform energy minimization, that is, to relax the systems. This process is there to ensure that possible steric conflicts will be take care of. In other words: it may have happened that during solvation some of the molecules were put in too close contact or too far way from each other. The former leads to enormous energies (atoms cannot overlap) and the latter can lead to the molecules accelerating too fast when the actual simulation starts.
We have now defined the initial structures, topologies and force field. In this step we apply the method of steepest descents (conjugate gradient would work as well) to relax the system. Notice that both steepest descents and conjugate gradient are methods of deterministic optimization, they are not real simulations (that sounds like an oxymoron…). In other words, the process will stop when they the first local energy minimum is found. This means that 1) it is sometimes a good idea to repeat this after the Equilibration (next) and 2) increasing the number of relaxation steps does not lead to better convergence since these algorithms stop after they find the first local minimum.
Important: If you for some reason wish to change the parameters in the .mdp
file at this time, you must return to previous stage and re-run it.
We use cubic geometry:
gmx mdrun -v -deffnm spc-oplsaa-solvated-cubic-minimize
The syntax:
mdrun # obvious
-v # verbose output so that you can follow what is happening
-deffnm # file names for input and output (all files will be named using this)
The following files will be generated by mdrun:
[filename].log # log file
[filename].edr # energy. Binary file
[filename].trr # Trajectory. Full precision, binary format.
[filename].gro # Energy minimized structure
where [filename] is either water-cubic-minimize or water-dode-minimize.
Did you get the message: Energy minimization reached the maximum number of steps before…
Did you get the warning message
Energy minimization reached the maximum number of steps defined by parameter `nsteps` before the forces
reached the requested precision Fmax < 10.
at the end? This was then probably preceded by something like
step 14: One or more water molecules can not be settled.
Check for bad contacts and/or reduce the time step if appropriate.
somewhere along the process.
It means, that the minimization procedure could not find an energy-minimized position for an atom (or atoms). When this happens, the program writes a .pdb
file/.pdb
file which allow one to go back and fix things. The file is named according to the step when this happened. In this case the file is called step14.pdb
. If you got such .pdb
files, take a look at them using VMD.
In practise, this message often means that the parameter nsteps
in em.mdp
file must me changed. This also means that one has to re-do the previous step and produce a new run input (.tpr
file). Re-running without changing the value of nsteps
will produce precisely the same result. Why? Remember: Conjugate gradient and steepest descents are deterministic minimization (or optimization) methods.
Should you worry about this or re-do the previous step? The answer is that it depends. What you should do is this:
you can go back to the previous step, change the value of
nsteps
and re-do the stepyou can open VMD and see that everything is at least superficially (=visually) ok.
run gromacs’ energy analysis to see if energy converged. For this the
.edr
file that was produced during this step is used. This is the command (change the file name if necessary):
gmx energy -f spc-oplsaa-solvated-cubic-minimize -o energy-check.xvg
You should plot at least potential energy.
Even the above error message and your energy seems to converge, you can try the next step. If it doesn’t crash, there is no need to return this is minimization step – and you should try to convince yourself why that is the case.
If you don’t know what all those options are, just remember how one gets help in gromacs:
gmx help energy
Check that energy converged
gmx energy -f spc-oplsaa-solvated-cubic-minimize -o energy-check.xvg
This gives you a menu you can use to select which quantities(s) you want to plot. The very least you should look at the potential energy. You can view the .xvg
file with xmgrace
(if you use Matlab or something else, you may need to change the comment style in the file; if you do so, you must again use an ASCII editor).
If everything looks more or less ok, you can proceed to the next step (=the problem was not a serious one). In the not so fortunate case, you have two choices: 1) try to produce a new file starting from the beginning or 2) use one of the .pdb
files produced during minimization, identify the problematic molecule and rotate or move it slightly and try again (by editing the file, using VMD or other software).
Step 5: Homework task:¶
Answer the following questions using your own results:
How many picoseconds did it take for the minimization to converge?
How do you know if the minimization actually worked out?
What other possible methods there are for energy minimization (=methods)?
Use VMD to visualize how the minimization process worked: pick residues 10 and 11, and superimpose the the starting structure and the minimized structure. Color the molecules In the initial structure in red and in the optimized structure in orange.
Based on the previous question: Describe the rotations and translations of the two water molecules (residues 10 and 11). Explain your observations.
Step 6: Pre-equilibration run in the NVE ensemble¶
In this step, we will perform an NVE (constant particle number, volume and energy) simulation – this is a real MD simulation unlike the minimization step above (the minimization is technically speaking deterministic optimization).
In this simulation, we keep the number particles (N), volume (V) and energy (E) constant. To be able to do all that, a new .mdp
file is needed, the previous one cannot be used. You must create a new file (the one below) and I suggest you name it nve-equib.mdp
. Remember that it must be an ASCII file, this is an absolute requirement. Notice also how there are comments in the .mdp
file. Always put in comments, they are very helpful.
nve-equib.mdp
;=================================================================
; Fairly minimal mdp file for NVE equilibration
; Name: nve-equib.mdp
; If in doubt: http://manual.gromacs.org/online/mdp_opt.html
;==================================================================
;
; No position restraints in this one. If needed, add them here.
;
;== RUN CONTROL. Time in ps. =================================
;
title = NVE equilibration of SPC water
integrator = md ; standard leap-frog. others are available incl. vv
dt = 0.001 ; time step in ps. No constraints: 0.001, w/: 0.002.
nsteps = 20000 ; # of steps. This would mean 20 ps for MD (20000*0.001).
comm-mode = linear ; removes both translation and rotation of COM (center of mass).
nstcomm = 100 ; frequency (in steps) to remove COM motion
;
;== NEIGHBOR LISTS ============================
;
cutoff-scheme = Verlet ; Verlet is needed for GPU. Works with CPUs too
nstlist = 20 ; neighbor list update frq. Verlet: 20-40, group <10. 20 fs.
ns_type = simple ; method for constructing neighbor lists. Grid is faster
pbc = xyz ; Full periodic boundary conditions
verlet-buffer-tolerance = 0.005 ; May have to use smaller value for NVE but let’s start with this
;
;== OUTPUT CONTROL ===================================================
;
nstxout = 500 ; write out coordinates every 0.5 ps. Standard: 1.0 ps
nstvout = 500 ; frq for writing out velocities
nstenergy = 500 ; frq for writing out energies
nstlog = 1000 ; frq for updating the log file
;
;== Electrostatics. Default order: 4, tol: 1e-5, spacing: 0.12 nm, geom: 3d ==
;
coulombtype = pme ; you really do want to use PME instead of cutoff
rcoulomb = 1.0 ; Coulomb real space cutoff
;
;== Van der Waals interactions.================================
;
; rvdw MUST be chosen to be commensurate with rcoulomb
;
vdwtype = cut-off ;
rvdw = 1.0 ; vdW cutoff.
dispcorr = EnerPres ; Dispersion correction applied to energy & pressure
;
;== TEMPERATURE COUPLING: NVT / NpT RUNS=======================
;
tcoupl = no ; NVE RUN needs no thermostat
;
;== PRESSURE COUPLING: NpT RUNS =====================
;
pcoupl = no ; NVE RUN (use this also for NVT) needs no barostat
;
;== GENERATE VELOCITIES: NEEDED FOR EQUILIBRATION AND SET UP =======================
;
; NOTE: gen_vel = no if the run is a continuation!
;
gen_vel = yes ; generate velocities from a Maxwell distribution
gen-seed = -1 ; random seed for random number generation
gen_temp = 300 ; temperature for the above Maxwell distribution
;
;== Continuing a previous simulation? ======================================
;
; Also needed when running back-to-back equilibration runs! (NVT-> NpT)
; IMPORTANT: You may need to set gen_vel=no above.
;
continuation = no ; restarting a previous run?
;
;====================================================================================
Now that you have nve-equib.mdp generated, we must grompp
again. As above, grompp
takes the parameter file (nve-equib.mdp
), the file after minimization and created a run input file.
Step 6.1: Grompp to set up an NVE run¶
In the previous step we minimized energy. The reason for that was to remove possible unphysical voids or/and molecular overlaps. The next step is to run a short simulation to ensure that the system is stable. If not, we need to go back to the setup procedure and solve possible problems. The first simulation is done in the NVE ensemble. For that we need a new .mdp
file with some new parameters. We also need to generate a new run input - the previous one was for minimization only.
The above means that we need to run the grompp
preprocessor and use the new .mdp
file. This process again combines the input parameters (nve-equib.mdp
), topology and the minimized structure from the last step:
gmx grompp -f nve-equib.mdp -c spc-oplsaa-solvated-cubic-minimize.gro -p spc-oplsaa-solvated-cubic.top -o spc-oplsaa-cubic-nve.tpr
Pay close attention to the messages grompp
produces. This process produces a new run input file (.tpr
). It will be needed in the next step for the NVE simulation.
-f nvt.mdp ; input, not modified. Uses posres.top
-c em.gro ; the structure for energy minimization as input. not modified
-p topol.top ; topology
-o nvt.tpr ; trajectory. Output. Modified
Sample output from the above command
One must always pay attention to the messages from the grommp
preprocessor, especially notes, warnings and errors. In this case, there were only 2 notes.
GROMACS: gmx grompp, version 2020.4
Executable: /data/TEACHING/CHEM3300G/gmx/bin/gmx
Data prefix: /data/TEACHING/CHEM3300G/gmx
Working dir: /data/TEACHING/CHEM3300G/water_1/Lab
Command line:
gmx grompp -f nve-equib.mdp -c spc-oplsaa-solvated-cubic-minimize.gro -p spc-oplsaa-solvated-cubic.top -o spc-oplsaa-cubic-nve.tpr
Ignoring obsolete mdp entry 'title'
Ignoring obsolete mdp entry 'ns_type'
Setting the LD random seed to -635203190
Generated 330891 of the 330891 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 330891 of the 330891 1-4 parameter combinations
Excluding 2 bonded neighbours molecule type 'SOL'
Setting gen_seed to -1686823819
Velocities were taken from a Maxwell distribution at 300 K
Analysing residue names:
There are: 2162 Water residues
Number of degrees of freedom in T-Coupling group rest is 12969.00
NOTE 1 [file nve-equib.mdp]:
NVE simulation: will use the initial temperature of 300.000 K for
determining the Verlet buffer size
NOTE 2 [file nve-equib.mdp]:
You are using a Verlet buffer tolerance of 0.005 kJ/mol/ps for an NVE
simulation of length 20 ps, which can give a final drift of 2%. For
conserving energy to 1% when using constraints, you might need to set
verlet-buffer-tolerance to 2.5e-03.
Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 300 K
Calculated rlist for 1x1 atom pair-list as 1.039 nm, buffer size 0.039 nm
Set rlist, assuming 4x4 atom pair-list, to 1.000 nm, buffer size 0.000 nm
Note that mdrun will redetermine rlist based on the actual pair-list setup
Calculating fourier grid dimensions for X Y Z
Using a fourier grid of 36x36x36, spacing 0.111 0.111 0.111
Estimate for the relative computational load of the PME mesh part: 0.26
This run will generate roughly 7 Mb of data
There were 2 notes
GROMACS reminds you: "You Will Be Surprised At What Resides In Your Inside" (Arrested Development)
Step 6.2: Perform pre-equilibration NVE run.¶
Now we are ready to simulate. The following command runs a short (20 ps) real NVE molecular dynamics simulation:
gmx mdrun -s spc-oplsaa-cubic-nve.tpr -o -x -deffnm spc-oplsaa-cubic-nve-run
Note: This leaves the command running on the foreground. Completion shouldn’t take too long:
On an i7 laptop with NVIDIA GTX1070 this took 4.9 seconds (CUDA 11.0, gcc 9.3.0)
One an -7 laptop with 12 cores it took about 20 seconds
Shouldn’t take more than 2 minutes even on an old computer. You can put it in the background by adding
&
character to the end of the line. There are also other ways of doing that but we will discuss them later.As a separate benchmark: On an i7 laptop with NVIDIA GTX1070 running 1 ns took 268 seconds (CUDA 11.0, gcc 9.3.0)
Command details:
mdrun # obvious
-s # run input file. Typically .tpr (the one created in the above step)
-x # produces compressed trajectory (xtc)
-o # produces full precision trajectory
-deffnm # file names (all files will be named using this)
At the end, you will get the details of the execution time and other stuff. Pay attention to the messages. Here is a sample:
starting mdrun '216H2O,WATJP01,SPC216,SPC-MODEL,300K,BOX(M)=1.86206NM,WFVG,MAR. 1984'
20000 steps, 20.0 ps.
Writing final coordinates.
Core t (s) Wall t (s) (%)
Time: 206.962 17.247 1200.0
(ns/day) (hour/ns)
Performance: 100.197 0.240
Again, you should take a look at what is going on using VMD.
You can even seen an on-screen movie: first load the .gro
file (spc-oplsaa-cubic-nve-run.gro
) in VMD and then use the “Load data into molecule” option in VMD to load the .trr
file (spc-oplsaa-cubic-nve-run.trr
is, the trajectory; the trajectory contains the coordinates at different times as defined by the parameter nstxout
in the .mdp
file).
You should also take a look at the different quantities. The first one to take a look at is total energy since in an NVE simulation the total energy must be conserved. It is also interesting to take a look at kinetic energy and potential energy.
We use the same procedure as above:
gmx energy -f spc-oplsaa-cubic-nve-run -o energy-check-nve.xvg
As above, create plots of the data and interpret what they indicate.
Here are sample plots, yours will be, obviously, somewhat different.
Step 6: Homework task¶
Answer the following questions using the plots below:
Examine the 3 different energies. Is the system well-behaved or are there indications of problems? Provide clear reasons for your answer.
Kinetic energy increases. What was the starting temperature? What is the final temperature (approximately)? How large is the increase?
Why does kinetic energy increase and potential energy decrease?
What can be done to maintain constant temperature?
NVE ensemble is, by definition, constant energy. Discuss the behavior of the total energy.
Compare your plots to these. Are there differences? Describe them and justify their origin.
Step 7: Set up a pre-equilibration NVT run¶
After the brief NVE run, let’s perform an NVT (constant particle number, volume and temperature; this is the canonical ensemble – NVE is called the microcanonical ensemble) run. For this, the .mdp file needs modifications in the respective sections as given below. Let’s save a copy of the current NVE .mdp
file under the name nvt-equilib.mdp
and make the changes in this new file. Then make the changes as indicated below. As you see, some new parameters come into play.
Makes these changes to the .mdp
file and
tcoupl = v-rescale ; NVT RUN
ref_t = 300 ; temperature in Kelvin
tau_t = 0.1 ; time constant for coupling. 0.1 ps
tc_grps = System ; Let's put the whole system on single thermostat
gen_vel = no ; we continue our previous run
continuation = yes ; restarting a previous run
gen_vel = no ; we continue our previous run
continuation = yes ; restarting a previous run
Important: If you took a look at the directory in which you performed the NVE run, you noticed that there is also a new file, a .cpt
file. That is a checkpoint file (binary) and it will be used when restarting. There is also a .gro file that was written at the same time. That contains the coordinates from the final stage of the simulations and they will be used for the restart.
Below is the grompp
command. Notice that the topology file is still the same. Makes sense since the molecular connectivity has not changed.
gmx grompp -f nvt-equib.mdp -c spc-oplsaa-cubic-nve-run.gro -t spc-oplsaa-cubic-nve-run.cpt -p spc-oplsaa-solvated-cubic.top -o spc-oplsaa-cubic-nvt.tpr
Command details:
-f ; input, not modified. Uses posres.top
-c ; the structure from energy minimization. not modified
-p ; topology
-o ; trajectory. Output. Modified
-t ; continuation for a previous run (.cpt file)
Step 7.1: Start the NVT equilibration run¶
We can now run an NVT simulation. Since we are continuing the previous simulation using the .cpt
it generated, we just continue where we left off after NVE. Whether or not that is necessary depends on the situation.
gmx mdrun -s spc-oplsaa-cubic-nvt.tpr -o -x -deffnm spc-oplsaa-cubic-nvt-run
Command details:
mdrun # obvious
-s # run input file. Typically .tpr (the one created in the above step)
-x # produces compressed trajectory (xtc)
-o # produces full precision trajectory
-deffnm # file names for input and output (all files will be named using this)
Here are the benchmarks, the run took just over 17 seconds:
Core t (s) Wall t (s) (%)
Time: 209.866 17.489 1200.0
(ns/day) (hour/ns)
Performance: 98.810 0.243
As above, perform the simple energy analysis using
gmx energy -f spc-oplsaa-cubic-nvt-run -o energy-check-nvt.xvg
and create plots of the data and interpret what the plots indicate.
Step 7: Homework Task¶
Answer the following questions using the plots below and your own results:
The first data point in the plots is from the final point of the previous NVE simulation. It seems that for all of the energy components, there is a very large jump. Is that indicative of a problem? Explain your answer carefully.
The kinetic energy seems to settle around 16,000 kJ/mol. Is that correct, too large or too small? Calculate the temperature to what 16,000 kJ/mol corresponds for this particular system.
Compare the kinetic energy to your simulation. Is it the same? Calculate the temperature for your simulation.
This is an NVT simulation, that is constant temperature. Kinetic energy and temperature are related. It seems that the kinetic energy is not constant. Is that a problem? Explain carefully.
Sample plots:
Step 8: Final simulation in the NpT ensemble¶
So far we have performed short simulation runs in the NVE and NVT ensembles. Now we will perform an NpT (constant particle number, pressure and temperature; this is also the canonical ensemble) simulation. We start the simulation starting from where we left off after the NVT run.
First we need, however, to grommp
again. This also means that the .mdp
file needs to be modified again since NpT has its own requirements. Remember: the letter p
means constant pressure. This allows the simulation box to adjust to the pressure (and hence density will fluctuate). That means we have to let the system size to change and for that we have to have a barostat (Parrinello-Rahman below).
Below are the changes to the .mdp
file, copy the previous file to create a new called npt-equilib.mdp
and make the changes as indicated below in that:
pcoupl = parrinello-rahman ; sets a parrinello-rahman barostat
pcoupltype = isotropic ; isotropic coupling
tau_p = 2.0 ; coupling time, 2 ps
ref_p = 1.0 ; pressure in bars
refcoord-scaling = all ; all coordinates scaled
compressibility = 4.5e-5 ; as it says, compressibility
Now we must grompp
.
gmx grompp -f npt-equib.mdp -c spc-oplsaa-cubic-nvt-run.gro -t spc-oplsaa-cubic-nvt-run.cpt -p spc-oplsaa-solvated-cubic.top -o spc-oplsaa-cubic-npt.tpr
Step 8.1: Run an NpT simulation.¶
After all that grompping, we are now ready to run an NpT simulation:
gmx mdrun -s spc-oplsaa-cubic-npt.tpr -o -x -deffnm spc-oplsaa-cubic-npt-run
Command details:
mdrun # obvious
-deffnm # file names for input and output (all files will be named using this)
Step 8: Homework task¶
Answer the following questions using your own results:
This is an NpT simulation, that is constant temperature and pressure. Plot 1) pressure and 2) the dimensions of the simulation box. In the case of the dimensions, you should plot x, y and z-dimensions in the same graph.
Are temperature and pressure constant (you need plots)? Explain your observations very carefully.
Discuss why and when one would use NVE, NVT and NpT simulations. Discuss the following: Should one expect the same results from all of them? In particular: If we would perform long simulations of the water system using NVE, NVT and NpT, should we expect the same results in terms of thermodynamic properties, hydrogen bonding, coordination and such?
You have now ran 20 ps simulations using NVE, NVT and NpT ensembles. Is 20 ps long or short? More importantly, is it enough? How can one determine the appropriate length of a simulation. Discuss and justify your answers carefully.
Use VMD to make a movie of the system.
Kinetic energy and temperature are related. It seems that the kinetic energy is not constant. Is that a problem? Explain carefully.
References¶
- 1(1,2)
J. L. F. Abascal and C. Vega. A general purpose model for the condensed phases of water: tip4p/2005. The Journal of Chemical Physics, 123(23):234505, 2005. doi:10.1063/1.2121687.
- 2(1,2,3,4,5)
Mark James Abraham, Teemu Murtola, Roland Schulz, Szilárd Páll, Jeremy C. Smith, Berk Hess, and Erik Lindahl. GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1-2:19–25, sep 2015. doi:10.1016/j.softx.2015.06.001.
- 3(1,2,3,4,5)
H. J. C. Berendsen, J. R. Grigera, and T. P. Straatsma. The missing term in effective pair potentials. The Journal of Physical Chemistry, 91(24):6269–6271, Nov 1987. doi:10.1021/j100308a038.
- 4(1,2,3,4)
H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, and J. Hermans. Interaction models for water in relation to protein hydration. In The Jerusalem Symposia on Quantum Chemistry and Biochemistry, pages 331–342. Springer Netherlands, 1981. doi:10.1007/978-94-015-7658-1_21.
- 5
Berk Hess, Carsten Kutzner, David van der Spoel, and Erik Lindahl. GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of Chemical Theory and Computation, 4(3):435–447, Mar 2008. doi:10.1021/ct700301q.
- 6(1,2,3,4,5)
William L. Jorgensen, Jayaraman Chandrasekhar, Jeffry D. Madura, Roger W. Impey, and Michael L. Klein. Comparison of simple potential functions for simulating liquid water. The Journal of Chemical Physics, 79(2):926–935, jul 1983. doi:10.1063/1.445869.
- 7(1,2)
William L. Jorgensen and Julian Tirado-Rives. The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. Journal of the American Chemical Society, 110(6):1657–1666, mar 1988. doi:10.1021/ja00214a001.
- 8(1,2)
Yuriy Khalak, Björn Baumeier, and Mikko Karttunen. Improved general-purpose five-point model for water: TIP5p/2018. The Journal of Chemical Physics, 149(22):224507, dec 2018. doi:10.1063/1.5070137.
- 9(1,2)
Michael W. Mahoney and William L. Jorgensen. A five-site model for liquid water and the reproduction of the density anomaly by rigid, nonpolarizable potential functions. The Journal of Chemical Physics, 112(20):8910, 2000. doi:10.1063/1.481505.
- 10
Siewert J. Marrink, H. Jelger Risselada, Serge Yefimov, D. Peter Tieleman, and Alex H. de Vries. The MARTINI force field: coarse grained model for biomolecular simulations. The Journal of Physical Chemistry B, 111(27):7812–7824, jul 2007. doi:10.1021/jp071097f.