-
Notifications
You must be signed in to change notification settings - Fork 1
Gromacs Tutorial
Generate Input Files
- Choose correct PDB: On PDB website:
- R-value (mismatched part of density vs. model; lower than 0.2 is good)
- Check wwPDB Validation on rcsb (clashscore, ramachandran outliers, sidechain outliers)
- Check if WT sequence
- Check ligands, to know what is there in the structure -- want to run apo but it might not actually be apo
- We chose 2PFK (unliganded) even tho the stats aren't as good bc it is apo
- Using the structure of 2PFK + C-term ends added from 1PFK (my structure)
- Download PDB and open pdb file.
- Look for disulfide bonds: search file for "ssbond" -- want to know if it's there is actually a disulfide bond and if the pdb structure reflects this (ie. would want to make it if it's actually there)
- Search "missing" -- keep in mind missing residues; or might want to fix it for a real simulation
- In 2PFK: missing C-term resi
- I fixed via aligning to 1PFK on pymol
- Can fix via rosetta, alphafold etc.
- Check if there is bad density in the PDB file
- Make sure file reflects the correct biological assembly (oligomeric state of protein) -- make tet if tet.
- Protonation state needs to be specified first otherwise gromacs will decide it for you which might be wrong
- Ie. protonation state for His, Asp, Glu
- Propka website: predicts protonation state
- Install
- CL: propka3 input.pdb
- Outputs input.pka (prediction of pka)
- If pH you want to work w is between pka and model-pka, then you would need to manually adjust the protonation state of the structure. At pH 7, everything is ok for us.
- H++ will generate pdb with updated protonation state.
- But then you can't use ignore hydrogens flag in gromacs bc it will not use the structure you produced with H++`
- Only amber compatible (not charmm)`
- In gromacs: generate topology:
- Need forcefield file (ie. need gromacs-compatible charmm36 to make) (can also use amber)
- Convert pdb file to gmx file and generate topology and forcefield for protein:
- CL: gmx pdb2gmx -f input.pdb -o outputfile.gro -ignh
- Choice of forcefield pops up: Charmm36
- Select water model: water model Is compatible with ff model --Tip3P_Charmm is recommended (3 point water model; there are also other models 3 point might not be good for everything but it is good enough for us rn)
- Error: gromacs does not recognize h atoms (crystal structure doesn't have h-atoms but rosetta structure does)
- -ignh flag is needed
- Output files: .gro file, .itp files, .top file -- Important to check each file (ie. via text editor like sublime text)
- .gro: structure file like .pdb (smaller than pdb but has lost some info ie. missing resi, disulfide bonds, chain info, dimension of box at the end of the file)
- Number of atoms in file -- check if matches pdb (but if the gro file you're using is generated by gromacs, this will match the pdb bc you just generated it)
- Atom type is compatible with force field
- Numbering scheme (index) will match the input pdb file (ie. if pdb starts with resi 32, .gro file will start with resi 32)
- .top file: topology -- connection relationship between atoms
- ; are comments in gromacs
- '#' are not comments
- Includes all the forcefield files (.itp) generated
- .itp file: forcefield file -- defines interactions between two atoms in the structure
- Ie. ffbonded.itp: defines interaction between two atoms; key thing is atom types
- gromacs looks at information in the tables in the .itp file to update the coordinates based on f = ma
- Bonds in .itp file: ie. between atom 1 and atom 2 there is bond type 1 (function) -- function 1 defines the interactions between this bond type and includes the parameters (determined by atom type)
- There is also info on angles and dihedrals
Simulate Water Molecules
- Simulating water molecules:
In MD simulation we run the simulation with explicit water molecules -- we need to put water molecules in. To do this, we specify how large is the box (unit cell) around the protein and how many water molecules we need to add:
- Generate box: CL: gmx editconf -f input.gro -o output_box.gro -c -d 1.2 <distance between max dimension of protein and edge of the box; put 1.2> -princ
- Check output in VMD
- PBC (Periodic boundary conditions): " A set of boundary conditions which are often chosen for approximating a large (infinite) system by using a small part called a unit cell."
- When protein crosses edge of the box, it will go into the box of "another protein" that is "next door", however these "other" proteins are actually all the same protein and don't really exist in the simulation; the movement of the main protein within the unit cell/box is mirrored in all the "other proteins" movements and this motion is continuous: Ie. if the protein goes out of box on the top, it is actually coming into the box at the bottom which is bad and means your unit cell is too small-- This is hard to explain with words...
- Make solvated file: add water molecules and other solvents:
- CL: gmx solvate -cp output_box.gro -cs spc216.gro -p topol.top -o output_sol.gro
- If you run a step multiple times it will update the top file every time. So delete the extra runs that are bad from the top file otherwise error.
- Generate .tpr file (binary run file for gromacs) needed to neutralize the system:
- CL: gmx grompp -f ions.mdp -c output_sol.gro -p topol.top -o ions.tpr -maxwarn 1
- Tpr file is needed when you want to input gromacs with mdp file (has params and options for simulation)
- Mdp files are used for each simulation -- Chenlin has optimized this file (ions.mdp)
- Running this this also checks the gro file and top file -- if there is error that means there is a mismatch.
- We see an warning: warning 1 -- there is a net charge --> add flag -maxwarn 1 (get rid of warning 1) for this step we are trying to neutralize the system, so we can add this flag
- PFK has a really high net charge (-36) bc of many charged resi
- Neutralize the system by adding counter ions (default is using NaCl, but you can specify the type)
- CL: gmx genion -s ions.tpr -o output_ions.gro -neutral -p topol.top
- Select a group to replace (replace solvent with counter ions): 13 (choose solvent (counter ions) to neutralize the charge)
- NaCl (default), KCl are best -- use metals on upper left of periodic table; Mg and Ca are also ok. Fe is more complicated to simulate and needs more params
- You can also add more ions ie. to increase the concentration of salt if you wanna run simulations
- Check generated gro file for added counterions
- Adding the counter ions is actually a simulation -- uses steep descent algorithm. Because of this the addition of counter ions is homogeneous and addition will not increase total energy too much or add clashes. If you add them manually the energy will usually increase a lot.
- Minimize Input Structure
- Before the actual simulation we need to minimize the input structure:
To minimize the structure, we first need to generate a tpr file (em.tpr) to specify that we want to minimize during the next step:
- Generate tpr file for the simulation:
- .tpr is the binary file needed to run the simulation
- CL: gmx grompp -f mini.mdp -c output_ions.gro -p topol.top -o em.tpr -maxwarn 1
- mini.mdp is Chenlin's mdp file for minimizing
- Maxwarn 1 : Error: SOD (gmx) vs. NA (charm36) -- name is diff but this is ok bc we know what it means
- Run simulation to minimize energy:
- CL: gmx mdrun -v -deffnm em <generate all files using input file name using em.tpr and generate all related files with em; set default file name for all options>
- The em file tells mdrun to minimize
- Check potential energy to see if min step worked: CL: gmx energy -f em.edr -o potential.xvg
- Edr file: saves al energy related stuff (
- Pops out options: can check diff types of energy in the force field
- Check potential -- CL: xmgrace potential.xvg (read potential energy)
- A good value is -Xe6
-
Check output of minimization
-
NVT Simulation: Simulation of system based on constant number (N), constant-volume (V), and constant-temp (T)
- Heat up the system (we actually increasing the temp):
- Look at the mdp file for heating up the sim
- CL: sub nvt.mdp (look at mdp file for heating up sim)
- Define = -DPOSRES (add constraints to the protein so it doesn't change too much when you are heating up the system; will mostly change the water molec)
- Specify integer: md simulation
- dt: 0.002 (2 fempto seconds; don't change)
- Nsteps = 50000 (100ps; how long you want to run sim)
- Temperature coupling is on (at macro scale, temp is just velocity)
- ref_t = 298.15 (what is temp of simulation (298.15K = 25C))
- Pressure coupling is off
- No pressure coupling in nvt (you can't change pressure if you change vol and the temp -- "ensemble" -- from thermodynamics)
- pbc = xyz (periodic boundary conditions) -- sometimes you don't want to turn on pbs in all axes (ie. membrane proteins) and this would need a bigger height of box)
- gen_seed = -1 (generate random seed)
- Run simulation (heating up):
- CL: gmx grompp -f nvt.mdp -c em.gro -p topol.top -o nvt.tpr -r em.gro
- Generated the tpr for this vt
- Run actual NVT simulation:
- CL: gmax mdrun -v deffnm nvt
- Produces nvt.edr (energy)
- -v deffnm nvt: To use the following files as the input: nvt.gro (last frame of simulation); em.gro (first frame); xtc file (trajectory file (coordinates of all atoms, but doesn't contain info for atoms, to visualize it you need gro and xtc file))
- automatically produces a log file
- Visualize md run: CL: vmd em.gro nvt.xtc
- Constrained simulation so the protein is not moving a lot; but water molec are not constrained so they will move a lot bc now we are heating up the system.
- Specify temperature:
- CL: gmx energy -f nvt.edr -o temperature.xvg
- Choose temperature and 0
- In em file there is no option for temp, so you have to specify this in the nvt file
- Check if temp is heated to specified value (visualize the xvg file):
- CL: xmgrace temperature.xvg
- NPT Simulation: Constant Number, Pressure, Temperature
- Run NPT Simulation:
Now that we heated the system, we will control the pressure -- run NPT simulation: Chenlin likes controll temp and then pressure separately
- Generate tpr file for NPT simulation:
- CL: gmx grompp -f npt.mdp -c nvt.gro -r npt.gro -p topol.top -t nvt.cpt -o npt.tpr
- Look at npt.mdp file:
- control the pressure and temp
- gen_vel = no (if you generate the v here, you are changing the temp from the previous step; we have already generated the velocity) velocity generation off after NVT
- nvt.cpt file = use already equilibrated velocity simulated from previous step NPT step
- Start from end of nvt run: restrain from last frame of nvt run (-r), coordinates to last frame of nvt run (-c), specify velocity from last frame of nvt run (-t)
- phase space: p= moment; coordinates + velocity then you know everything about of the phase state.
- Run simulation:
- CL: gmx mdrun -v -deffnm npt
- Check trajectory:
- CL: vmd em.gro npt.xtc
- xtc file: binary file (compressed file): contains xyz info not atom info at each time step; this is why you need the gro file when you visualize.
- Checking to see that protein didn't change too much
- CL: sub npt.gro
- Notice: volume of box changed (if you try to control the pressure the volume will change)
- Specify pressure:
- CL: gmx energy -f npt.edr -o pressure.xvg
- Check pressure:
- CL: xmgrace pressure.xvg
- Pressure control is v difficult compared to temp-- lots of deviations but overall it is controlled.
- Check the temperature in the npt run: shouldn't change much bc used from nvt run
- CL: gmx energy -f npt.edr -o temp_npt.xvg
- Production Run
- Run actual production run:
- Generate tpr file for simulation:
- CL: gmx grompp -f md.mdp -c npt.gro -r npt.gro -p topol.top -t npt.cpt -o md.tpr -maxwarn -1
- Look at md.mdp file:
- Define = (no constraint)
- Comm-grps = protein; comm-mode = angular (get rid of rotation and transl)
- Pcoupl = parrinello-rahman (changed pressure control algorithm, now more suitable for production run)
- Nsteps = 1000000 ; 1mil fempto sec = 1ns
- Control pressure and temp
- Gen_vel = no (use velocity generated by npt run -- use last frame from last run of npt) *should always use last frame from last run
- Run simulation:
- CL: gmx mdrun -v deffnm md
- Check md simulation: 2ns = 200 frames
- Check rmsd: compared to minimized structure -- first relaxes a bit and then becomes almost stable by 1ns
What do you get from apo run?
- if rmsd is unchanged throughout the simulation -- protein is stable
- rms -- check flexibility
- Need apo simulation to compare to holo simulation -- a reference
- How long resi stay in a position (compare to crystal structure which is a snapshot in time)
- Coloumb interactions vs. vdw interactions -- is water involved in these interactions ? If water is involved the water molecules will stay there for a long time.
*MD simulation is not theoretical but it is an "experiment on a computer" (not derived math, etc. This is not quantum chemistry); everything that happens in an experiment can happen here like random things; rosetta is also an experiment
Make topology for ligand: GaussView software to add hydrogens to ligand (get correct charge state for ligand):
- Save as .gjf. Can delete everything in file except atom type and coordinates
- Get global minimum via quantum calculation – get charge of ligand
o In .gjf file: Add two lines of code (has keywords) for optimization and then frequency analysis to make sure this conformation is the global min %chk=TPB_opt.chk "#p" opt freq b3lyp/6-311+G** empiricaldispersion=…. o In .gjf file: Specify multiplicity of spin o Check output: Optimized molecule should be planar; Check frequency to make sure there are no negative freq (if there are neg, it is not optimized to the global min)
- Single point calculation: different keywords than for global min calculation -- calc the energy of this state and doesn’t change molecular info; produce checkpoint file which calculates a wave function to use for charge calculation:
o in Gaussian 16: formchk file.chk –> produces .fchk file o choose Restrained Electrostatic potential o output .chg file with all the charges for each atom
- Topology depends on the forcefield you are using. CL prefers Charm: CGenFF website
o convert your pdb file to a .mol2 file o check to make sure the atoms have the right amount of bonds and correct if not o upload .mol2 file to CGenFF to generate .str file (used for generating topology)
Generate forcefield:
o charm forcefield script: convert .str file to a file that can be read by gromacs. o Create conda environment suitable for this script o CL: python script.py TPB file.mol2 file.str file.ff (make sure resname is correct in mol2 and str files)
○ Outputs .itp, .prm, .top, .pdb files
o After generating forcefield, you have to change the charge in the .itp with the charge generated by the dft calculation (in the .chg file)
- Make protein only .pdb file from your original pdb (and add hydrogen to make it compatible w ff file)
- CL: gmx pdb2gmx -f file.pdb -o file.pdb -ignh
- Choose choice 1: Charmm36; then choice 1 again o Generate .itp file: defines every resi, charge, etc.
- Add files to .top file: order matters a lot
o Add .prm file after .itp file : #include file.itp” o Add ____ file above Include water topology (since not adding water to this system in this example) o Add ligand to end of .top file (ie. TBP 1)
- Add ligand to .pdb file: above solvent section
o Make sure atom naming scheme is the same as in the file generated from script
Following steps are the same as in the -ligand tutorial (see other notes):
- Add a box around protein
- Make solvated file a. Keep the waters that were originally there and add more
- Make .tpr file
- Neutralize the system by adding counter ions
- Minimize input structure
- NVT simulation
- NPT simulation
- Production Run
- New member onboarding
- Lab jobs
- Seminar schedules
- How to order
- Group meeting schedule
- Lab notebooks
- Funding opportunities
- Philosophy of science
- Wet lab basics
- Lab safety
- Waste disposal
- Chemical inventory
- -20C inventory
- Molecular biology
- Buffers and reagents
- Protocols library
- DNA synthesis and primers
- 80C freezer organization
- Using server
- C2B2 HPC access
- Update lab website
- Cluster parallel processing
- Mercury at CUIMC
- Getting started with Rosetta
- Install Pyrosetta
- Tutorials
- Clone Github
- Gromacs-Tutorial
- Cluster Specs
- Deep MSA and Statistical Coupling Analysis
- MMseqs2: Make MSA and analyze taxonomy
- Useful tools