Talk:Protein structure prediction

This is the talk page for discussing improvements to the Protein structure prediction article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Molecular Biology: MCB / COMPBIO High‑importance

	This article is within the scope of WikiProject Molecular Biology, a collaborative effort to improve the coverage of Molecular Biology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Molecular BiologyWikipedia:WikiProject Molecular BiologyTemplate:WikiProject Molecular BiologyMolecular Biology articles
High	This article has been rated as High-importance on the importance scale.
	This article is supported by the Molecular and Cell Biology task force (assessed as High-importance).
	This article is supported by the Computational Biology task force (assessed as High-importance).

Comparative structure prediction

In the third paragraph, should comparative structure prediction be linked to comparative protein modeling? 98.192.58.208 21:23, 17 October 2007 (UTC)[reply]

chaperonins

I have a couple of suggested amendments to the text.

Davjon, be bold and make the changes! Stewart Adcock 19:05, 21 Mar 2004 (UTC)

Firstly the statement about chaperonins is not correct. The discovery of chaperonins (and the small chaperone proteins) did not affect the statement that amino acid sequence entirely encodes the fold of a protein (i.e. the conclusion reached by Anfinsen). All chaperonins have been shown to do is to assist (i.e. increase the yield of) the folding process - mainly to prevent aggregation of the protein when hydrophobic surface area is exposed during folding. In other words, chaperonins allow a protein to fold into the conformation encoded by its amino acid sequence more efficiently. There is still no reason to doubt that all the information required to reproduce the fold of a protein is in the amino acid sequence, particularly in the context of protein structure prediction where inter-molecular aggregation is clearly not an issue. See the recent review by Saibil & Ranson (Trends Biochem Sci. 2002 Dec;27(12):627-632) for a more thorough discussion of what is currently known about chaperonins. User:Davjon 08:35, 20 March 2004

Yes, the article is not very clear on this point. Stewart Adcock 19:05, 21 Mar 2004 (UTC)

conditional conformation

Having said that, an additional complication is that some proteins are only able to fold into their biologically useful conformation under particular conditions (e.g. high pH or low temperature) or in the presence of other molecules (e.g. metals) or even other protein chains (in the formation of certain protein complexes). User:Davjon 08:35, 20 March 2004

You are, of course, correct again. But the problem is on deciding how much information you want to put in the article while still keeping it simple and understandable. Most readers will have very little background knowledge about proteins. Stewart Adcock 19:05, 21 Mar 2004 (UTC)

I have modified the article to reflect the lack of evidence that primary sequence does not determine protein folding. I noted that chaperones and glycosylation can be important in protein folding. --Antelan 20:52, 4 October 2006 (UTC)[reply]

ab initio / de novo

Lastly, the correct term is ab initio protein structure prediction rather than de novo modelling. Although both Latin terms have a similar meaning, de novo has historically been used in the phrase "de novo protein design" and not in the context of protein structure prediction. User:Davjon 08:35, 20 March 2004

Try arguing with a quantum chemist about this point ;-) When I wrote that, I was almost certianly introducing my own POV. I agree that ab initio is the common term, but not necessarily the correct term, being semantically incorrect. Again, this is probably just my POV and you should be bold. Stewart Adcock 19:05, 21 Mar 2004 (UTC)

PS Davjon, please sign your posts on talk pages. (You can use three or four tilde characters to sign without or with a date stamp, respectively).

crystallography terms

I reverted two recent changes. Here's why:

1. restored link to "X-ray crystallography" over "crystallography" because very few protein structures are solved by electron/neutron diffraction (not surprising since they destroy the sample).

2. first sentence. "Computational biology" is something very different to "Computational molecular biology". The most appropriate article on wikipedia seems to be bioinformatics so I've pointed the link at that.

Stewart Adcock 00:04, 2 Feb 2004 (UTC)

I've added theoretical chemistry because ab initio calculations belong to that field. --Zivilverteidigung

Back in the pre-genome pre-proteomics era, quite a few structures were determined by neutrons. Electrons are essential to do 2D crystallography of membrane proteins, which is all you can do with most of them. The sentence as you have it now is reasonable with "typically" and X-ray crystallography, but I consider crystallography a much better article. Really I think the two should be merged.168... 00:14, 2 Feb 2004 (UTC)

I agree that they should be merged (but as they currently stand, the X-ray cyrstallography one is more relevant to this article). I don't think either article is particularly brilliant though. Stewart Adcock 01:50, 3 Feb 2004 (UTC)

NP-hard modeling

I just reverted User:Micha's changes. Here is the reverted version:

De novo protein modelling methods seek to build three-dimensional protein models "from scratch". There are many possible procedures that either attempt to mimic protein folding or apply some stochastic method to search possible solutions. It has been shown that the complete computation of the structure of a protein is actually NP-hard [1]. Accordingly, these procedures require vast computational resources. Blue Gene is a powerful supercomputer designed to push the frontier to larger structures. Today, approaches that use a Monte Carlo method are most successful. Starting from a randomly assembled protein, a set of new structures is generated by making small random changes. From this set, the best structure is selected with an energy function, and used as a starting point for a new iteration. With many iterations and many initial structures, good structural proposals can be obtained within short times.

I dispute that "computation of the structure of a protein" is NP-hard. The reference given is to a paper (that I know very well) about protein design, which is an NP-hard problem. Protein fold prediction is much more complex.
I also dispute that Monte Carlo methods are the most successful approaches used today (although they are probably amongst the most successful). This would be easy to confirm by looking at the methods applied during the most recent CASP competition. Monte Carlo methods are included in the set of "stochastic methods" already mentioned. I agree that more detail is probably warranted here.

Stewart Adcock 07:20, 6 Feb 2004 (UTC)

Okay, I was perhaps a bit rash about the NP-hard description. However, I'm sure about the Monte Carlo methods: In CASP5 and CASP4 the Rosetta method of Dr. David Baker (UW) was most successful. [2] The description I wrote applies to the Rosetta method. [3]

Micha 05:00, 8 Feb 2004 (UTC)

Hi Micha. You are probably correct in saying that the single best approach to date is the Baker group's Rosetta, which does use a Monte Carlo algorithm. However, the fact that Rosetta uses MC isn't what makes it so good. Feel free to specifically mention Rosetta, but I don't think it is fair to say approaches that use a Monte Carlo method are most successful just because Rosetta does. If I was describing Rosetta, I'd make sure that I mentioned the fact that it starts by deriving a set of short fragments, and then assembles these. One reason why I didn't review some of the more interesting approaches is that I was trying to keep the article as simple as possible for anyone to understand. Do you think we should add more specific details? Stewart Adcock 18:55, 8 Feb 2004 (UTC)

Hello Stewart. I see your point in keeping the general article as simple as possible. Perhaps it would be better to create De novo protein structure prediction, and go into more detail there. Then the current article would stay as it is, and in the other article one could address CASP and other things. (I know CASP really addresses comparative modelling as well, so we could also mention/link it in the general article.) I don't have time to do it right now, but'd do it eventually. Micha 21:06, 8 Feb 2004 (UTC)

That seems like a good idea to me. CASP is probably significant enough to warrant its own article, even. Stewart Adcock 22:49, 8 Feb 2004 (UTC)

Okay. Let's start CASP at first. Micha 02:06, 9 Feb 2004 (UTC)

Ligand binding

Hi everyone, I was wondering whether it is appropriate to mention that another method to determine a partial protein structure are studies investigating ligand binding sites; in particular, the use of ligand analogues to determine the binding site's spacial features. I know it's only a partial structure, but should it be mentioned here? Volantares 12:25, 7 December 2006 (UTC)[reply]

That would be of interest if you have any particular references on the subject, often information gathered from the literature is used as hint by the CASP competitors but in an empirical way. If there is a methodology to retrieve structural information from ligand binding that would be worth mentioning, if it's just personal research no. Blastwizard 18:45, 7 December 2006 (UTC)[reply]

The software section needs a major revamp

The software for structure prediction could use a major overhaul, or should just link to the dedicated software page. —The preceding unsigned comment was added by Mndoci (talk • contribs) 20:55, 5 February 2007 (UTC).[reply]

adding a new reference to "protein structure prediction"

Within the Software section, reference might be made to a new prediction engine called LOMETS from Wu and Zhang at U of Kansas. The announcement article is

Wu S, Zhang Y. LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res. 2007;35(10):3375-82. Epub 2007 May 3.

LOMETS is a tuned consensus of 9 engines, all supported in a server cluster by the authors. 152.2.146.67 22:19, 17 June 2007 (UTC)[reply]

I think we need to add some textbooks in the reference list. —Preceding unsigned comment added by Dreamcarrior (talk • contribs) 15:19, 12 June 2009 (UTC)[reply]

Now More Important Than Ever

"The practical role of protein structure prediction is now more important than ever." The phrase sounds like an advertisement. Perhaps something more concrete... "Protein structure prediction bridges the increasing divide between the faster modern sequence gathering technologies and the slower protein analysis [...etc]"JeramieHicks (talk) 23:12, 21 November 2008 (UTC)[reply]

How does homochirality factor into modeling?

I am not a scientist, so pardon if this is a stupid question. How does homochirality factor into modeling algorithms? Since every molecule is capable of being arranged into an identical mirror image, but the mirror image is often inactive or toxic, then it would appear that half the search space can be thrown away and not computed. DMahalko (talk) 20:59, 11 May 2009 (UTC)[reply]

Homology modelling algoritms already ignore the D form of the residues, since they only occur in extremely rare cases in bacteria so nothing new there Blastwizard (talk) 09:19, 24 June 2009 (UTC)[reply]

Automatic structure prediction servers

CASP is not about automatic structure prediction servers, this is a confusion with CAFASP. In CASP, human prediction are also assessed although human prediction normally involves manually fiddling with server predictions. Also why reducing this section to the CASP experiment? Blastwizard (talk) 09:26, 24 June 2009 (UTC)[reply]

Calculated free energy of native state

I would like to know whether the commonly used computer programs always give a free energy for the native structure (as determined by NMR or X-rays) less than or equal to that for the best solutions that the programs find. (If someone answers, drop me a note please.) Eric Kvaalen (talk) 14:55, 13 August 2010 (UTC)[reply]

Of course not. A typical program would produce something like minus several thousand kcal/mol calculated with CHARMM or another force field. This suppose to be enthalpy in vacuum/vapor, which is totally irrelevant to protein folding, ligand binding or anything of biological significance (but this number is not even this enthalpy because force fields neglect the environment-dependence of vDW and other forces).Biophys (talk) 18:08, 13 August 2010 (UTC)[reply]

But as the article says, "The protein structure prediction remains an extremely difficult and unresolved undertaking. The two main problems are calculation of protein free energy and finding the global minimum of this energy. A protein structure prediction method must explore the space of possible protein structures which is astronomically large." I have read that Rosetta for instance calculates free energy. My question is not whether the energy calculation is really good, my question is whether if you put the true structure in, the program will agree that its free energy is lower than (or equal to) the free energy of what it came up with. In other words, do the programs fail to find the true configuration because they don't manage to find the true minimum (according to their energy functions), or because their energy functions are slightly wrong, so even if they find the true minimum according to those functions, it doesn't correspond to the conformation that minimizes the true free energy. Eric Kvaalen (talk) 12:06, 14 August 2010 (UTC)[reply]

Rosetta combines force fields in vacuum (in modified form) with empirical terms that describe free energy of solvation and conformational entropy. The key question: can it reproduce experimental deltaG differences between unfolded state and folded protein structure? Such values are measured for hundreds proteins and rather small (only several kcal/mol). There are empirical free energy functions (not MM force fields) that claim to reproduce delta deltaG values for protein mutants and deltaG of ligand binding with a reasonable precision. Are these functions good enough to distinguish the native and misfolded structures (that sometimes differ only by 3-4 kcal/mol according to experimental studies) is a hard question. I think the answer is "no" in general, but "yes" for certain special cases, but this is a matter of ongoing "original research".Biophys (talk) 19:32, 14 August 2010 (UTC)[reply]

Thanks for the reply. 3-4 kcal/mol is actually a lot, since RT is only 0.6 kcal/mol. I hope you'll come back with your results. You can put original research here on the discussion page. Eric Kvaalen (talk) 10:26, 17 August 2010 (UTC)[reply]

Posting WP:OR here would be against the policies. Obviously, this should be larger than RT. Otherwise, the system would be unstable. No, 3-4 kcal/mol is actually nothing compare to 3-4 thousands kcal/mol produced by CHARMM for a typical protein. The CHARMM energy of an unfolded proteins would also be large. So, let's subtract one from another and do not forget about conformational entropy that could be ~ 1-2.5 kcal/mol per residue at 300K? Be serious. Assuming an error of ~10% in 6-12 potential functions, it would be statistically impossible to calculate the small difference with the required precision (article by Galaktionov in Russian "Biophysics"). The real problem however that such functions have been developed to calculate irrelevant energy (yes, that's an energy, but not the energy one needs - this is something rather common in physics).Biophys (talk) 17:47, 17 August 2010 (UTC)[reply]

TODO

Secondary structure prediction section need to be shortened. Currently, it is too long and higly redundant with [[4]] Hsp90 (talk) 09:19, 27 October 2014 (UTC)[reply]

Non-polarity as 'reason' for sheets/helices

Is the statement/implication that I added correct, that the non-polarity of the (normally polar) amino acids in an alpha and beta secondary structure? I'm guessing that's a correct interpretation of what it said about the polarity changes, but is that always true? I.e. are their "regular cases" where, say, a beta sheet will be close to another residue(s) in the protein which will "deform" the sheet in some way (creating a "bump" in an otherwise "smooth" sheet?) Note that this is my first reading of the article, and I'm basically making clarifications based on my own understanding "easier" (if one were going to read this article as section from a textbook - which would probably be much more useful - anyone have a suggestion for a good "protein structure/folding" textbook?). Jimw338 (talk) 20:09, 9 March 2015 (UTC)[reply]

Italic 2409:4073:4D45:1991:A4CB:258B:96A4:8F64 (talk) 14:08, 14 October 2024 (UTC)[reply]

Non-polarity as 'reason' for sheets/helices

Is the statement/implication I added correct, that the non-polarity of the (normally polar) amino acids in an alpha and beta secondary structures can be thought of as the "reason" they form in the first place? (this makes me think of the Blind Watchmaker meets Intelligent Design)

I'm guessing that's a correct interpretation of what it said about the polarity changes, but is that always true? I.e. are their "regular cases" where, say, a beta sheet will be close to another residue(s) in the protein which will "deform" the sheet in some way (creating a "bump" in an otherwise "smooth" sheet?)

Note that this is my first reading of the article, and I'm basically making clarifications based on my own understanding "easier" (if one were going to read this article as section from a textbook - which would probably be much more useful - anyone have a suggestion for a good "protein structure/folding" textbook?). Jimw338 (talk) 20:11, 9 March 2015 (UTC)[reply]

Hi Jim, I'm not quite sure what you mean by this - you mention that you added a statement, but I don't see an edit from you to this article recently; did I miss it?

Secondary structures aren't necessarily nonpolar - in fact, an alpha helix, regardless of sequence, has an overall dipole moment. For those sequences that form regular, stable secondary structure, the "reason" they do so is that the secondary structure is lower in energy than the unfolded/disordered state - i.e., the formation of hydrogen bonds, hydrophobic interactions, etc. outweigh the loss of entropy and the desolvation effects associated with helix or sheet formation.

There can definitely be irregularities in secondary structures. These are often introduced by the presence of proline or glycine residues, but can also occur due to local packing in a folded protein. Secondary structures aren't actually as rigid as they look in pictures illustrating their hydrogen-bonding patterns; they often fray at their ends.

As for textbooks, Branden and Tooze is a classic: ISBN 978-0815323051. Opabinia regalis (talk) 04:40, 10 March 2015 (UTC)[reply]

Assessment comment

The comment(s) below were originally left at Talk:Protein structure prediction/Comments, and are posted here for posterity. Following several discussions in past years, these subpages are now deprecated. The comments may be irrelevant or outdated; if so, please feel free to remove this section.

Changed rating to "high" as this is an important method in bioinformatics. - tameeria 22:17, 18 February 2007 (UTC)[reply]

Last edited at 22:17, 18 February 2007 (UTC). Substituted at 03:29, 30 April 2016 (UTC)

AlphaFold

Should Deepmind/Google's new algorithm be it's own section about such inference from CS?72.174.71.134 (talk) 09:45, 4 December 2020 (UTC)[reply]

Yes, but this should be probably a more general section about using "deep learning" and other similar methods. Welcome to contribute. My very best wishes (talk) 00:19, 30 December 2020 (UTC)[reply]