In 1962 Max F. Perutz won the Nobel Prize in Chemistry for deducing the structure of haemoglobin from data collected with x-ray crystallography. Today virtual laboratories use supercomputers and computer grids to solve protein structure problems by working from first chemical principles There are nice discussions at Stanford's folding@home page and at the Blueprint project's protein folding page. Each of these sites offers you the chance to enlist your own home computer in the protein folding task. (Also see http://www.technologyreview.com/articles/waldrop0502.asp.)
Recent news events reminding us of mad cow disease point to the importance of folding properly: the disease seems to be the result of a protein that has folded in a wrong way.
Brian White, a colleague here, must teach his beginning biology students that changing a single amino acid in a protein can dramatically alter its shape, and hence its biologicial activity. He asked me if I could write a program to solve the structure problem quickly (in seconds or minutes on a PC) so that students could appreciate this phenomenon. Since that was unlikely, I looked for a simple solvable analogous problem that he could use along with some software he's developed (along with students in my software engineering classes). You can see the project at intro.bio.umb.edu/aipotu/.
We model an amino acid as a disk (or sphere) of fixed radius with a hydrophobic index that describes how much an acid wants (index < 0) or does not want (index > 0) to be exposed to its watery environment. We imagine that the disks can occupy the cells of a regular tessellation of the plane (or space). A chain of amino acids folds so as to occupy a sequence of cells each sharing an edge (or face) with the next, in such a way as to minimize the energy, measured as the total number of edges (or faces) of occupied cells exposed to the environment (unoccupied cells), weighted by the hydrophobic indices of the amino acids occupying the cells.
Here's a Java applet that allows you to experiment with that model - instructions follow if you need them.
To tell the program what you want it to do for you:
For random generation, specify the amino acid table: either the natural table of real amino acids or the virtual table of amino acids with hydrophobic indices uniformly distributed between 1.0 and -1.0.
If you are using the text area for input, enter a sequence of amino acids separated by commas or spaces. You may enter either three character names chosen from the natural table or decimal fractions between -1.0 and 1.0, separated by commas or spaces (but you can't mix the two kinds of input).
To specify a previously folded chain in the text area, enter the acid names with directions between them separated by colons, like this:
Leu : E : Met : E : Arg : NW : Ala : W : Leu : W : Leu : none :Directions should be chosen as appropriate for the grid you select (see below).
Note: the dodecahedral folding algorithm seems sometimes to freeze the browser. If it's my bug, sorry, but I don't have time to fix it now.
In either case, the software needs to be told how to fold when two candidates have the same energy. The possibilites are to continue in straight ahead (the default) or to try to bend. You get to pick.
You will find some statistics in the message area.
Even a chain as small as 10 amino acids will allow you explore how small changes in a single hydrophobic index can dramatically affect the folded shape. You might enjoy trying to create various possible patterns. Polypeptide tangrams or polyominoes are a small step on the way toward designer drugs.
The sequences
-0.96 0.11 0.14 0.68 0.96and
-0.961 0.111 0.136 0.682 0.966fold differently. Can you see why? Can you produce the same effect changing just one of the hydrophobic indices?
Bogdan Calota is helping with the code and will help with the statistics.
You can run the application on your own computer by downloading folding.jar and invoking it it from any command prompt this way:
java -cp folding.jar FoldingWindowThe source code is available too, if you want to play with that.
I conjecture that finding the absolute minimum is an NP-complete problem. (I'd love to find a proof.) The brute force backtracking search through all the candidate configurations is expensive. For the hexagonal plane grid with n=1,2,3,... the number of possibilities is 1, 6, 30, 138, 618, 2730, 11946, 51882, 224130, 964134, 4133166, ... - that's sequence A001334 in the On-Line Encyclopedia of Integer Sequences. The growth is rate is approximately 4n.
Here are pictures of minimum energy foldings for some random chains of length 17. Red acids are hydrophobic, green are hydrophilic. Unfortunately, computing each answer took about 12 hours on my PC.
For a powerpoint presentation of this material, visit clark.ppt.