Wednesday 15 October 2008

Convert pdb to sdf or mol2

Found the excellent set of utilities yesterday called Open Babel.

The utility babel is a command line program that allows you to easily switch between chemical file formats. There is a GUI which I tried on my mac but couldn't get to work. However, the command line program is so easy to use that I don't really care!

Wednesday 1 October 2008

Autodock - virtual screening

As I was on a role from my attempts described below I decided to try some virtual screening:

1. I went to the free ZINC database and downloaded a couple thousand molecules.

2. The molecules were downloaded as a single huge .mol2 file which I split using:

split_multi_mol2_file.py -i filename

getting split_multi_mol2_file.py from the AutoDock Virtual Screening Tutorial.

3. Next I converted the mol2 files into pdbqt's using:

foreach f (`ls *`)
echo $f
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_ligand4.py -l $f -d ligand_dict.py
end

4. I divided the pdbqt files into separate directories (of about 1000 compounds) and then carried out the multiple ligand docking pocedure outlined in my posts below. I ran about eight jobs in total on different processors (sort of manual parallel processing!) which took about 24 hours in total processing time.

5. To evaluate the thousands of dockings I ran the script:

#!/bin/csh
cd ./Dockings
foreach d (`/bin/ls`)
echo $d
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/summarize_results4.py -d $d -t 2. -L -a -o ../summary_run1.txt
end

This gave me the list:

largestCl_dlgfn #runs #cl #LC LE_LC rmsd_LC #ats #tors
ZINC00000077_rigid/ZINC00000077_rigid, 20, 14, 3, -6.0700, 56.6501, 25, 6
ZINC00000123_rigid/ZINC00000123_rigid, 20, 4, 8, -5.7200, 51.5977, 25, 1
ZINC00000154_rigid/ZINC00000154_rigid, 20, 3, 11, -6.4300, 52.8504, 23, 4
ZINC00000169_rigid/ZINC00000169_rigid, 20, 8, 7, -6.6700, 56.0226, 23, 4

where:

largestCl_dlgfn: filename of the dlg file which contains the result with
the lowest energy in the largest cluster.

#runs: total number of runs found in all the dlg files in the specified
directory

#cl: number of distinct clusters formed in the clustering done on all the
dlg files in the specified directory

LE_LC: the energy of the lowest energy conformation in the largest cluster

rmsd_LC: the rmsd difference between the lowest energy conformation in the
largest cluster and the reference ligand conformation. This is the input
ligand conformation unless a different reference is specified with the
'-f' flag

#ats: number of atoms in the ligand

#tors: number of active torsions in the ligand

Autodock - evaluating dockings

To evaluate my dockings I did it the hard way with a graphics program, however first needed to split the docking log files into multiple pdb files that could be read by the graphics programs. To do this I invoked the following script in each of my /Dockings/ligands directories:

#!
grep '^DOCKED' T000015_rigid.dlg | cut -c9- > my_docking.pdbqt
cut -c-66 my_docking.pdbqt > my_docking.pdb
# csh to split pdb files from autodock output.
# edit outputname.
#
set outputname=T000015
set a=`grep ENDMDL my_docking.pdb | wc -l`
set b=`expr $a - 2`
csplit -k -s -n 3 -f $outputname. my_docking.pdb '/^ENDMDL/+1' '{'$b'}'
foreach f ( $outputname.[0-9][0-9][0-9] )
mv $f $f.pdb
end

This gave me a whole bunch of pdb files which I loaded into PyMOL to examine using the PyMOL script:

load T000015.000.pdb, conf0
load T000015.001.pdb, conf1
load T000015.002.pdb, conf2
load T000015.003.pdb, conf3
etc...

Autodock 4 - making gpf & dpf files

I made my grid parameter files using Autodock tools, with the atom types found in the top of summary.txt from my ligands directories (simply add the atom types to the pop-up box in ADT) as per the tutorial. I ran AutoGrid and then checked to make sure I had all the .map files I was expecting.

Next I copied all of my ligand pdbq files into the same directory as my grid files. I then ran the following script, adapted from the virtual screening tutorial:

#!/bin/csh
mkdir Dockings
cd Dockings
foreach f (`ls ../*.pdbqt`)
set name = `basename $f .pdbqt`
echo $name
mkdir "$name"_rigid
cd "$name"_rigid
cp ../"$f" .
ln -s ../../protein.pdbqt .
ln -s ../../protein*map* .
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_dpf4.py -l `basename $f` -r protein.pdbqt \
-p ga_num_evals=1750000 \
-p ga_pop_size=150 \
-p ga_run=20 \
-p rmstol=2.0
mv "$name"_protein.dpf "$name"_rigid.dpf
cd ..
end

This made a new directory called "Dockings", within this a directory for each of my ligands, and within each ligand directory a link to the appropriate map files and a docking parameter file for the AutoDock run for each ligand.

To run the multiple autodockings I made a file called ligands.list containing the name of all my ligand directories (ls > ligands.list then delete the file names) and invoked multiple autodock runs with:

#!/bin/csh
foreach d (`cat docking.list`)
echo $d
cd $d
/home/progs/AutoDock/autodocksuite/bin/i8664Linux2/autodock4 -p $d.dpf -l $d.dlg
cd ..
end

Autodock 4 - overview & preparing ligand & protein

I need to try some dockings of ligands to a protein that will remain nameless until we publish the paper fairly shortly. Essentially I have solved a number of crystal structures and want to design some new compounds but would like an idea of how they bind before the chemists go ahead and make them. I could dock them by eye using a program like COOT or PyMOL, but find it very difficult trying to get all the distances and angles correct. I am not really interested in the calculated delta G's as AutoDock is notoriously bad at estimating these. If I can get a rough ranking that would be great, but otherwise it is the structural detail I am more interested in.

The first thing I did was get my IT manager to install Autodock 4, AutoGrid and AutoDockTools (ADT).

I then performed five steps (I'll talk about 1 & 2 in this post and 3-5 in later posts):

1. Prepare the protein both as a rigid docking receptor and using the new "flexible residues" feature to see if that makes a difference.
2. Prepare the ligands for auto-docking.
3. Make the grid parameter file (.gpf) based on the atoms in the ligands and run AutoGrid.
4. Make the docking parameter file (.dpf) and run AutoDock for each of my ligands.
5. Analyse the results.

For step 1 I used the tutorial: "Using AutoDock 4 with AutoDockTools: A Tutorial" by Ruth Huey and Garrett Morris.

However for step 2 I borrowed some ideas from the virtual screening tutorial as I had twelve ligands to prepare:

1. Firstly I made pdb files of all my ligands using the PRODRG2 server.

2. Next I invoked the file "prepare_ligand4.py" from the ADT utilities24 directory using a c-shell script adapted from the virtual screening tutorial called make_pdbqt.csh and chmod +x'ed:

#!/bin/csh
foreach f (`ls *`)
echo $f
/home/progs/MGLTools-1.5.1/MGLToolsPckgs/AutoDockTools/Utilities24/prepare_ligand4.py -l $f -d ligand_dict.py
end

This produced a bunch of pdbqt files (pdb + partial charges + AutoDock atom types) for each of my ligands.

3. I then "examined" the pdbqt files to get a list of atoms in all the ligands using examine_ligand_dict.py again from the ADT utilities directory, which I copied into my working directory and invoked with the command:

./examine_ligand_dict.py > summary.txt