Found the excellent set of utilities yesterday called Open Babel.
The utility babel is a command line program that allows you to easily switch between chemical file formats. There is a GUI which I tried on my mac but couldn't get to work. However, the command line program is so easy to use that I don't really care!
Wednesday, 15 October 2008
Wednesday, 1 October 2008
Autodock - virtual screening
As I was on a role from my attempts described below I decided to try some virtual screening:
1. I went to the free ZINC database and downloaded a couple thousand molecules.
2. The molecules were downloaded as a single huge .mol2 file which I split using:
split_multi_mol2_file.py -i filename
getting split_multi_mol2_file.py from the AutoDock Virtual Screening Tutorial.
3. Next I converted the mol2 files into pdbqt's using:
foreach f (`ls *`)
echo $f
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_ligand4.py -l $f -d ligand_dict.py
end
4. I divided the pdbqt files into separate directories (of about 1000 compounds) and then carried out the multiple ligand docking pocedure outlined in my posts below. I ran about eight jobs in total on different processors (sort of manual parallel processing!) which took about 24 hours in total processing time.
5. To evaluate the thousands of dockings I ran the script:
#!/bin/csh
cd ./Dockings
foreach d (`/bin/ls`)
echo $d
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/summarize_results4.py -d $d -t 2. -L -a -o ../summary_run1.txt
end
This gave me the list:
largestCl_dlgfn #runs #cl #LC LE_LC rmsd_LC #ats #tors
ZINC00000077_rigid/ZINC00000077_rigid, 20, 14, 3, -6.0700, 56.6501, 25, 6
ZINC00000123_rigid/ZINC00000123_rigid, 20, 4, 8, -5.7200, 51.5977, 25, 1
ZINC00000154_rigid/ZINC00000154_rigid, 20, 3, 11, -6.4300, 52.8504, 23, 4
ZINC00000169_rigid/ZINC00000169_rigid, 20, 8, 7, -6.6700, 56.0226, 23, 4
where:
largestCl_dlgfn: filename of the dlg file which contains the result with
the lowest energy in the largest cluster.
#runs: total number of runs found in all the dlg files in the specified
directory
#cl: number of distinct clusters formed in the clustering done on all the
dlg files in the specified directory
LE_LC: the energy of the lowest energy conformation in the largest cluster
rmsd_LC: the rmsd difference between the lowest energy conformation in the
largest cluster and the reference ligand conformation. This is the input
ligand conformation unless a different reference is specified with the
'-f' flag
#ats: number of atoms in the ligand
#tors: number of active torsions in the ligand
1. I went to the free ZINC database and downloaded a couple thousand molecules.
2. The molecules were downloaded as a single huge .mol2 file which I split using:
split_multi_mol2_file.py -i filename
getting split_multi_mol2_file.py from the AutoDock Virtual Screening Tutorial.
3. Next I converted the mol2 files into pdbqt's using:
foreach f (`ls *`)
echo $f
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_ligand4.py -l $f -d ligand_dict.py
end
4. I divided the pdbqt files into separate directories (of about 1000 compounds) and then carried out the multiple ligand docking pocedure outlined in my posts below. I ran about eight jobs in total on different processors (sort of manual parallel processing!) which took about 24 hours in total processing time.
5. To evaluate the thousands of dockings I ran the script:
#!/bin/csh
cd ./Dockings
foreach d (`/bin/ls`)
echo $d
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/summarize_results4.py -d $d -t 2. -L -a -o ../summary_run1.txt
end
This gave me the list:
largestCl_dlgfn #runs #cl #LC LE_LC rmsd_LC #ats #tors
ZINC00000077_rigid/ZINC00000077_rigid, 20, 14, 3, -6.0700, 56.6501, 25, 6
ZINC00000123_rigid/ZINC00000123_rigid, 20, 4, 8, -5.7200, 51.5977, 25, 1
ZINC00000154_rigid/ZINC00000154_rigid, 20, 3, 11, -6.4300, 52.8504, 23, 4
ZINC00000169_rigid/ZINC00000169_rigid, 20, 8, 7, -6.6700, 56.0226, 23, 4
where:
largestCl_dlgfn: filename of the dlg file which contains the result with
the lowest energy in the largest cluster.
#runs: total number of runs found in all the dlg files in the specified
directory
#cl: number of distinct clusters formed in the clustering done on all the
dlg files in the specified directory
LE_LC: the energy of the lowest energy conformation in the largest cluster
rmsd_LC: the rmsd difference between the lowest energy conformation in the
largest cluster and the reference ligand conformation. This is the input
ligand conformation unless a different reference is specified with the
'-f' flag
#ats: number of atoms in the ligand
#tors: number of active torsions in the ligand
Autodock - evaluating dockings
To evaluate my dockings I did it the hard way with a graphics program, however first needed to split the docking log files into multiple pdb files that could be read by the graphics programs. To do this I invoked the following script in each of my /Dockings/ligands directories:
#!
grep '^DOCKED' T000015_rigid.dlg | cut -c9- > my_docking.pdbqt
cut -c-66 my_docking.pdbqt > my_docking.pdb
# csh to split pdb files from autodock output.
# edit outputname.
#
set outputname=T000015
set a=`grep ENDMDL my_docking.pdb | wc -l`
set b=`expr $a - 2`
csplit -k -s -n 3 -f $outputname. my_docking.pdb '/^ENDMDL/+1' '{'$b'}'
foreach f ( $outputname.[0-9][0-9][0-9] )
mv $f $f.pdb
end
This gave me a whole bunch of pdb files which I loaded into PyMOL to examine using the PyMOL script:
load T000015.000.pdb, conf0
load T000015.001.pdb, conf1
load T000015.002.pdb, conf2
load T000015.003.pdb, conf3
etc...
#!
grep '^DOCKED' T000015_rigid.dlg | cut -c9- > my_docking.pdbqt
cut -c-66 my_docking.pdbqt > my_docking.pdb
# csh to split pdb files from autodock output.
# edit outputname.
#
set outputname=T000015
set a=`grep ENDMDL my_docking.pdb | wc -l`
set b=`expr $a - 2`
csplit -k -s -n 3 -f $outputname. my_docking.pdb '/^ENDMDL/+1' '{'$b'}'
foreach f ( $outputname.[0-9][0-9][0-9] )
mv $f $f.pdb
end
This gave me a whole bunch of pdb files which I loaded into PyMOL to examine using the PyMOL script:
load T000015.000.pdb, conf0
load T000015.001.pdb, conf1
load T000015.002.pdb, conf2
load T000015.003.pdb, conf3
etc...
Autodock 4 - making gpf & dpf files
I made my grid parameter files using Autodock tools, with the atom types found in the top of summary.txt from my ligands directories (simply add the atom types to the pop-up box in ADT) as per the tutorial. I ran AutoGrid and then checked to make sure I had all the .map files I was expecting.
Next I copied all of my ligand pdbq files into the same directory as my grid files. I then ran the following script, adapted from the virtual screening tutorial:
#!/bin/csh
mkdir Dockings
cd Dockings
foreach f (`ls ../*.pdbqt`)
set name = `basename $f .pdbqt`
echo $name
mkdir "$name"_rigid
cd "$name"_rigid
cp ../"$f" .
ln -s ../../protein.pdbqt .
ln -s ../../protein*map* .
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_dpf4.py -l `basename $f` -r protein.pdbqt \
-p ga_num_evals=1750000 \
-p ga_pop_size=150 \
-p ga_run=20 \
-p rmstol=2.0
mv "$name"_protein.dpf "$name"_rigid.dpf
cd ..
end
This made a new directory called "Dockings", within this a directory for each of my ligands, and within each ligand directory a link to the appropriate map files and a docking parameter file for the AutoDock run for each ligand.
To run the multiple autodockings I made a file called ligands.list containing the name of all my ligand directories (ls > ligands.list then delete the file names) and invoked multiple autodock runs with:
#!/bin/csh
foreach d (`cat docking.list`)
echo $d
cd $d
/home/progs/AutoDock/autodocksuite/bin/i8664Linux2/autodock4 -p $d.dpf -l $d.dlg
cd ..
end
Next I copied all of my ligand pdbq files into the same directory as my grid files. I then ran the following script, adapted from the virtual screening tutorial:
#!/bin/csh
mkdir Dockings
cd Dockings
foreach f (`ls ../*.pdbqt`)
set name = `basename $f .pdbqt`
echo $name
mkdir "$name"_rigid
cd "$name"_rigid
cp ../"$f" .
ln -s ../../protein.pdbqt .
ln -s ../../protein*map* .
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_dpf4.py -l `basename $f` -r protein.pdbqt \
-p ga_num_evals=1750000 \
-p ga_pop_size=150 \
-p ga_run=20 \
-p rmstol=2.0
mv "$name"_protein.dpf "$name"_rigid.dpf
cd ..
end
This made a new directory called "Dockings", within this a directory for each of my ligands, and within each ligand directory a link to the appropriate map files and a docking parameter file for the AutoDock run for each ligand.
To run the multiple autodockings I made a file called ligands.list containing the name of all my ligand directories (ls > ligands.list then delete the file names) and invoked multiple autodock runs with:
#!/bin/csh
foreach d (`cat docking.list`)
echo $d
cd $d
/home/progs/AutoDock/autodocksuite/bin/i8664Linux2/autodock4 -p $d.dpf -l $d.dlg
cd ..
end
Autodock 4 - overview & preparing ligand & protein
I need to try some dockings of ligands to a protein that will remain nameless until we publish the paper fairly shortly. Essentially I have solved a number of crystal structures and want to design some new compounds but would like an idea of how they bind before the chemists go ahead and make them. I could dock them by eye using a program like COOT or PyMOL, but find it very difficult trying to get all the distances and angles correct. I am not really interested in the calculated delta G's as AutoDock is notoriously bad at estimating these. If I can get a rough ranking that would be great, but otherwise it is the structural detail I am more interested in.
The first thing I did was get my IT manager to install Autodock 4, AutoGrid and AutoDockTools (ADT).
I then performed five steps (I'll talk about 1 & 2 in this post and 3-5 in later posts):
1. Prepare the protein both as a rigid docking receptor and using the new "flexible residues" feature to see if that makes a difference.
2. Prepare the ligands for auto-docking.
3. Make the grid parameter file (.gpf) based on the atoms in the ligands and run AutoGrid.
4. Make the docking parameter file (.dpf) and run AutoDock for each of my ligands.
5. Analyse the results.
For step 1 I used the tutorial: "Using AutoDock 4 with AutoDockTools: A Tutorial" by Ruth Huey and Garrett Morris.
However for step 2 I borrowed some ideas from the virtual screening tutorial as I had twelve ligands to prepare:
1. Firstly I made pdb files of all my ligands using the PRODRG2 server.
2. Next I invoked the file "prepare_ligand4.py" from the ADT utilities24 directory using a c-shell script adapted from the virtual screening tutorial called make_pdbqt.csh and chmod +x'ed:
#!/bin/csh
foreach f (`ls *`)
echo $f
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_ligand4.py -l $f -d ligand_dict.py
end
This produced a bunch of pdbqt files (pdb + partial charges + AutoDock atom types) for each of my ligands.
3. I then "examined" the pdbqt files to get a list of atoms in all the ligands using examine_ligand_dict.py again from the ADT utilities directory, which I copied into my working directory and invoked with the command:
./examine_ligand_dict.py > summary.txt
The first thing I did was get my IT manager to install Autodock 4, AutoGrid and AutoDockTools (ADT).
I then performed five steps (I'll talk about 1 & 2 in this post and 3-5 in later posts):
1. Prepare the protein both as a rigid docking receptor and using the new "flexible residues" feature to see if that makes a difference.
2. Prepare the ligands for auto-docking.
3. Make the grid parameter file (.gpf) based on the atoms in the ligands and run AutoGrid.
4. Make the docking parameter file (.dpf) and run AutoDock for each of my ligands.
5. Analyse the results.
For step 1 I used the tutorial: "Using AutoDock 4 with AutoDockTools: A Tutorial" by Ruth Huey and Garrett Morris.
However for step 2 I borrowed some ideas from the virtual screening tutorial as I had twelve ligands to prepare:
1. Firstly I made pdb files of all my ligands using the PRODRG2 server.
2. Next I invoked the file "prepare_ligand4.py" from the ADT utilities24 directory using a c-shell script adapted from the virtual screening tutorial called make_pdbqt.csh and chmod +x'ed:
#!/bin/csh
foreach f (`ls *`)
echo $f
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_ligand4.py -l $f -d ligand_dict.py
end
This produced a bunch of pdbqt files (pdb + partial charges + AutoDock atom types) for each of my ligands.
3. I then "examined" the pdbqt files to get a list of atoms in all the ligands using examine_ligand_dict.py again from the ADT utilities directory, which I copied into my working directory and invoked with the command:
./examine_ligand_dict.py > summary.txt
Saturday, 2 August 2008
Autodock..ing
My very first experience with automated docking came about four years ago when my boss decided to find out whether docking algorithms could be helpful in our search for drugs that bound to the pentraxin proteins serum amyloid P component (SAP) and C-reactive protein (CRP). As we had a number of crystal structures, and I had spent the first year of my PhD experimentally measuring Kd's, enthalpy and entropy of binding using Isothermal titration calorimetry, my boss figured it would make a nice chapter in my thesis if I evaluated automated docking. I chose to use AUTODOCK (then ver.2.4) because it was free and the developers seemed quite keen to help.
I have posted a snippet of my methods and results from the chapter of my thesis relating to this work here. Essentially I found that AUTODOCK was unable to deal with calcium atoms. After a bit of to-ing and fro-ing with the developers I was able to come up with some parameters that seemed to allow the ligands to dock in a semi-satisfactory manner, and although the actual orientations seemed correct (and the program was mostly able to order the ligands in order of affinity) the calculated delta-G's bared little resemblance to the experimentally measured parameters. There was also evidence that the electrostatics of the interaction completely dominated the calculations whilst the critically important hydrophobic interactions and hydrogen bonding were less important for the docking.
In retrospect i should have written up and published these results at the time, however the experiments were done prior to some big publications and we did not want to risk the chances of our Nature papers!
I have posted a snippet of my methods and results from the chapter of my thesis relating to this work here. Essentially I found that AUTODOCK was unable to deal with calcium atoms. After a bit of to-ing and fro-ing with the developers I was able to come up with some parameters that seemed to allow the ligands to dock in a semi-satisfactory manner, and although the actual orientations seemed correct (and the program was mostly able to order the ligands in order of affinity) the calculated delta-G's bared little resemblance to the experimentally measured parameters. There was also evidence that the electrostatics of the interaction completely dominated the calculations whilst the critically important hydrophobic interactions and hydrogen bonding were less important for the docking.
In retrospect i should have written up and published these results at the time, however the experiments were done prior to some big publications and we did not want to risk the chances of our Nature papers!
Friday, 25 July 2008
Agenda
I am starting this blog pretty much as a reminder to myself of what I have tried, and what is available for ligand design and optimisation based upon structural data.
Subscribe to:
Posts (Atom)