Gromacs Tutorial

gromacs Tutorial (Apo Structure):

Generate Input Files

Choose correct PDB: On PDB website:

R-value (mismatched part of density vs. model; lower than 0.2 is good)
Check wwPDB Validation on rcsb (clashscore, ramachandran outliers, sidechain outliers)
Check if WT sequence
Check ligands, to know what is there in the structure -- want to run apo but it might not actually be apo

We chose 2PFK (unliganded) even tho the stats aren't as good bc it is apo

Using the structure of 2PFK + C-term ends added from 1PFK (my structure)

Download PDB and open pdb file.

Look for disulfide bonds: search file for "ssbond" -- want to know if it's there is actually a disulfide bond and if the pdb structure reflects this (ie. would want to make it if it's actually there)
Search "missing" -- keep in mind missing residues; or might want to fix it for a real simulation

In 2PFK: missing C-term resi

I fixed via aligning to 1PFK on pymol

Can fix via rosetta, alphafold etc.

Check if there is bad density in the PDB file
Make sure file reflects the correct biological assembly (oligomeric state of protein) -- make tet if tet.
Protonation state needs to be specified first otherwise gromacs will decide it for you which might be wrong

Ie. protonation state for His, Asp, Glu

Propka website: predicts protonation state

Install

CL: propka3 input.pdb

Outputs input.pka (prediction of pka)

If pH you want to work w is between pka and model-pka, then you would need to manually adjust the protonation state of the structure. At pH 7, everything is ok for us.

H++ will generate pdb with updated protonation state.

But then you can't use ignore hydrogens flag in gromacs bc it will not use the structure you produced with H++`

Only amber compatible (not charmm)`

In gromacs: generate topology:

Need forcefield file (ie. need gromacs-compatible charmm36 to make) (can also use amber)
Convert pdb file to gmx file and generate topology and forcefield for protein:

CL: gmx pdb2gmx -f input.pdb -o outputfile.gro -ignh

Choice of forcefield pops up: Charmm36

Select water model: water model Is compatible with ff model --Tip3P_Charmm is recommended (3 point water model; there are also other models 3 point might not be good for everything but it is good enough for us rn)

Error: gromacs does not recognize h atoms (crystal structure doesn't have h-atoms but rosetta structure does)

-ignh flag is needed

Output files: .gro file, .itp files, .top file -- Important to check each file (ie. via text editor like sublime text)

.gro: structure file like .pdb (smaller than pdb but has lost some info ie. missing resi, disulfide bonds, chain info, dimension of box at the end of the file)

Number of atoms in file -- check if matches pdb (but if the gro file you're using is generated by gromacs, this will match the pdb bc you just generated it)

Atom type is compatible with force field

Numbering scheme (index) will match the input pdb file (ie. if pdb starts with resi 32, .gro file will start with resi 32)

.top file: topology -- connection relationship between atoms

; are comments in gromacs

'#' are not comments

Includes all the forcefield files (.itp) generated

.itp file: forcefield file -- defines interactions between two atoms in the structure

Ie. ffbonded.itp: defines interaction between two atoms; key thing is atom types

gromacs looks at information in the tables in the .itp file to update the coordinates based on f = ma

Bonds in .itp file: ie. between atom 1 and atom 2 there is bond type 1 (function) -- function 1 defines the interactions between this bond type and includes the parameters (determined by atom type)

There is also info on angles and dihedrals

Simulate Water Molecules

Simulating water molecules:

In MD simulation we run the simulation with explicit water molecules -- we need to put water molecules in. To do this, we specify how large is the box (unit cell) around the protein and how many water molecules we need to add:

Generate box: CL: gmx editconf -f input.gro -o output_box.gro -c -d 1.2 <distance between max dimension of protein and edge of the box; put 1.2> -princ

Check output in VMD

PBC (Periodic boundary conditions): " A set of boundary conditions which are often chosen for approximating a large (infinite) system by using a small part called a unit cell."

When protein crosses edge of the box, it will go into the box of "another protein" that is "next door", however these "other" proteins are actually all the same protein and don't really exist in the simulation; the movement of the main protein within the unit cell/box is mirrored in all the "other proteins" movements and this motion is continuous: Ie. if the protein goes out of box on the top, it is actually coming into the box at the bottom which is bad and means your unit cell is too small-- This is hard to explain with words...

/var/folders/nw/n7k6g6_50qg4g_4ztrc0p90c0000gn/T/com.microsoft.Word/Content.MSO/C10EB83.tmp

Make solvated file: add water molecules and other solvents:

CL: gmx solvate -cp output_box.gro -cs spc216.gro -p topol.top -o output_sol.gro

If you run a step multiple times it will update the top file every time. So delete the extra runs that are bad from the top file otherwise error.

Generate .tpr file (binary run file for gromacs) needed to neutralize the system:

CL: gmx grompp -f ions.mdp -c output_sol.gro -p topol.top -o ions.tpr -maxwarn 1

Tpr file is needed when you want to input gromacs with mdp file (has params and options for simulation)

Mdp files are used for each simulation -- Chenlin has optimized this file (ions.mdp)

Running this this also checks the gro file and top file -- if there is error that means there is a mismatch.

We see an warning: warning 1 -- there is a net charge --> add flag -maxwarn 1 (get rid of warning 1) for this step we are trying to neutralize the system, so we can add this flag

PFK has a really high net charge (-36) bc of many charged resi

Neutralize the system by adding counter ions (default is using NaCl, but you can specify the type)

CL: gmx genion -s ions.tpr -o output_ions.gro -neutral -p topol.top

Select a group to replace (replace solvent with counter ions): 13 (choose solvent (counter ions) to neutralize the charge)

NaCl (default), KCl are best -- use metals on upper left of periodic table; Mg and Ca are also ok. Fe is more complicated to simulate and needs more params

You can also add more ions ie. to increase the concentration of salt if you wanna run simulations

Check generated gro file for added counterions

Adding the counter ions is actually a simulation -- uses steep descent algorithm. Because of this the addition of counter ions is homogeneous and addition will not increase total energy too much or add clashes. If you add them manually the energy will usually increase a lot.

Minimize Input Structure

Before the actual simulation we need to minimize the input structure:

To minimize the structure, we first need to generate a tpr file (em.tpr) to specify that we want to minimize during the next step:

Generate tpr file for the simulation:

.tpr is the binary file needed to run the simulation

CL: gmx grompp -f mini.mdp -c output_ions.gro -p topol.top -o em.tpr -maxwarn 1

mini.mdp is Chenlin's mdp file for minimizing

Maxwarn 1 : Error: SOD (gmx) vs. NA (charm36) -- name is diff but this is ok bc we know what it means

Run simulation to minimize energy:

CL: gmx mdrun -v -deffnm em <generate all files using input file name using em.tpr and generate all related files with em; set default file name for all options>

The em file tells mdrun to minimize

Check potential energy to see if min step worked: CL: gmx energy -f em.edr -o potential.xvg

Edr file: saves al energy related stuff (

Pops out options: can check diff types of energy in the force field

Check potential -- CL: xmgrace potential.xvg (read potential energy)

A good value is -Xe6

Check output of minimization
NVT Simulation: Simulation of system based on constant number (N), constant-volume (V), and constant-temp (T)

Heat up the system (we actually increasing the temp):

Look at the mdp file for heating up the sim

CL: sub nvt.mdp (look at mdp file for heating up sim)

Define = -DPOSRES (add constraints to the protein so it doesn't change too much when you are heating up the system; will mostly change the water molec)

Specify integer: md simulation

dt: 0.002 (2 fempto seconds; don't change)

Nsteps = 50000 (100ps; how long you want to run sim)

Temperature coupling is on (at macro scale, temp is just velocity)

ref_t = 298.15 (what is temp of simulation (298.15K = 25C))

Pressure coupling is off

No pressure coupling in nvt (you can't change pressure if you change vol and the temp -- "ensemble" -- from thermodynamics)

pbc = xyz (periodic boundary conditions) -- sometimes you don't want to turn on pbs in all axes (ie. membrane proteins) and this would need a bigger height of box)

gen_seed = -1 (generate random seed)

Run simulation (heating up):

CL: gmx grompp -f nvt.mdp -c em.gro -p topol.top -o nvt.tpr -r em.gro

Generated the tpr for this vt

Run actual NVT simulation:

CL: gmax mdrun -v deffnm nvt

Produces nvt.edr (energy)

-v deffnm nvt: To use the following files as the input: nvt.gro (last frame of simulation); em.gro (first frame); xtc file (trajectory file (coordinates of all atoms, but doesn't contain info for atoms, to visualize it you need gro and xtc file))

automatically produces a log file

Visualize md run: CL: vmd em.gro nvt.xtc

Constrained simulation so the protein is not moving a lot; but water molec are not constrained so they will move a lot bc now we are heating up the system.

Specify temperature:

CL: gmx energy -f nvt.edr -o temperature.xvg

Choose temperature and 0

In em file there is no option for temp, so you have to specify this in the nvt file

Check if temp is heated to specified value (visualize the xvg file):

CL: xmgrace temperature.xvg

NPT Simulation: Constant Number, Pressure, Temperature

Run NPT Simulation:

Now that we heated the system, we will control the pressure -- run NPT simulation: Chenlin likes controll temp and then pressure separately

Generate tpr file for NPT simulation:

CL: gmx grompp -f npt.mdp -c nvt.gro -r npt.gro -p topol.top -t nvt.cpt -o npt.tpr

Look at npt.mdp file:

control the pressure and temp

gen_vel = no (if you generate the v here, you are changing the temp from the previous step; we have already generated the velocity) velocity generation off after NVT

nvt.cpt file = use already equilibrated velocity simulated from previous step NPT step

Start from end of nvt run: restrain from last frame of nvt run (-r), coordinates to last frame of nvt run (-c), specify velocity from last frame of nvt run (-t)

phase space: p= moment; coordinates + velocity then you know everything about of the phase state.

Run simulation:

CL: gmx mdrun -v -deffnm npt

Check trajectory:

CL: vmd em.gro npt.xtc

xtc file: binary file (compressed file): contains xyz info not atom info at each time step; this is why you need the gro file when you visualize.

Checking to see that protein didn't change too much

CL: sub npt.gro

Notice: volume of box changed (if you try to control the pressure the volume will change)

Specify pressure:

CL: gmx energy -f npt.edr -o pressure.xvg

Check pressure:

CL: xmgrace pressure.xvg

Pressure control is v difficult compared to temp-- lots of deviations but overall it is controlled.

Check the temperature in the npt run: shouldn't change much bc used from nvt run

CL: gmx energy -f npt.edr -o temp_npt.xvg

Production Run

Run actual production run:

Generate tpr file for simulation:

CL: gmx grompp -f md.mdp -c npt.gro -r npt.gro -p topol.top -t npt.cpt -o md.tpr -maxwarn -1

Look at md.mdp file:

Define = (no constraint)

Comm-grps = protein; comm-mode = angular (get rid of rotation and transl)

Pcoupl = parrinello-rahman (changed pressure control algorithm, now more suitable for production run)

Nsteps = 1000000 ; 1mil fempto sec = 1ns

Control pressure and temp

Gen_vel = no (use velocity generated by npt run -- use last frame from last run of npt) *should always use last frame from last run

Run simulation:

CL: gmx mdrun -v deffnm md

Check md simulation: 2ns = 200 frames

Check rmsd: compared to minimized structure -- first relaxes a bit and then becomes almost stable by 1ns

What do you get from apo run?

if rmsd is unchanged throughout the simulation -- protein is stable
rms -- check flexibility
Need apo simulation to compare to holo simulation -- a reference
How long resi stay in a position (compare to crystal structure which is a snapshot in time)
Coloumb interactions vs. vdw interactions -- is water involved in these interactions ? If water is involved the water molecules will stay there for a long time.

*MD simulation is not theoretical but it is an "experiment on a computer" (not derived math, etc. This is not quantum chemistry); everything that happens in an experiment can happen here like random things; rosetta is also an experiment

gromacs Tutorial (Ligand Bound Structure):

Make topology for ligand: GaussView software to add hydrogens to ligand (get correct charge state for ligand):

Save as .gjf. Can delete everything in file except atom type and coordinates

Get global minimum via quantum calculation – get charge of ligand
o In .gjf file: Add two lines of code (has keywords) for optimization and then frequency analysis to make sure this conformation is the global min 
	%chk=TPB_opt.chk
	"#p" opt freq b3lyp/6-311+G** empiricaldispersion=….
o In .gjf file: Specify multiplicity of spin 
o Check output: Optimized molecule should be planar; Check frequency to make sure there are no negative freq (if there are neg, it is not optimized to the global min)
Single point calculation: different keywords than for global min calculation -- calc the energy of this state and doesn’t change molecular info; produce checkpoint file which calculates a wave function to use for charge calculation:
o in Gaussian 16: formchk file.chk –> produces .fchk file 
o choose Restrained Electrostatic potential 
o output .chg file with all the charges for each atom 
Topology depends on the forcefield you are using. CL prefers Charm: CGenFF website
o convert your pdb file to a .mol2 file
o check to make sure the atoms have the right amount of bonds and correct if not
o upload .mol2 file to CGenFF to generate .str file (used for generating topology)

Generate forcefield:

o charm forcefield script: convert .str file to a file that can be read by gromacs.
o Create conda environment suitable for this script
o CL: python script.py TPB file.mol2 file.str file.ff (make sure resname is correct in mol2 and str files)

  ○ Outputs .itp, .prm, .top, .pdb files

o After generating forcefield, you have to change the charge in the .itp with the charge generated by the dft calculation (in the .chg file)

Make protein only .pdb file from your original pdb (and add hydrogen to make it compatible w ff file)

CL: gmx pdb2gmx -f file.pdb -o file.pdb -ignh

Choose choice 1: Charmm36; then choice 1 again o Generate .itp file: defines every resi, charge, etc.

Add files to .top file: order matters a lot
o Add .prm file after .itp file : #include file.itp”
o Add ____ file above Include water topology (since not adding water to this system in this example)
o Add ligand to end of .top file (ie. TBP 1)
Add ligand to .pdb file: above solvent section
o Make sure atom naming scheme is the same as in the file generated from script 

Following steps are the same as in the -ligand tutorial (see other notes):

Add a box around protein
Make solvated file a. Keep the waters that were originally there and add more
Make .tpr file
Neutralize the system by adding counter ions
Minimize input structure
NVT simulation
NPT simulation
Production Run

Home

How to use this wiki

General

Wet lab

Dry lab

Using server
C2B2 HPC access
Update lab website
Cluster parallel processing
Mercury at CUIMC
Getting started with Rosetta
- Install Pyrosetta
- Tutorials
Clone Github
Gromacs-Tutorial
Cluster Specs
Deep MSA and Statistical Coupling Analysis
MMseqs2: Make MSA and analyze taxonomy
Useful tools

Provide feedback

Saved searches

Use saved searches to filter your results more quickly