NetCSSP is a neural network-based predictor of contact-dependent secondary structure propensity in protein sequences. The local secondary structure of a protein can change depending on the tertiary environment such as inter-molecular interactions and misfolding. One prominent example is non-native beta- strand formation in most amyloid fibrils.
We have quantitatively standardized the tertiary interaction energies of various proteins and then trained neural networks (NetCSSP) to predict secondary structures of a protein from the standardized tertiary interaction energies together with a sequence context (7-residue span) . Thus the power of the Net-CSSP method is that it provides insight on the change of secondary structural propensities in the entire range of tertiary interaction energies for a given amino acid sequence without any knowledge of its 3D structure.
One important application of Net-CSSP is its accurate predictive ability in identifying core sequences of amyloid fibril formation where the switch in local secondary structure occurs.
Dual artificial neural network for CSSP prediction
The dual network architecture consists of the two single networks that use distinct output nodes.
One network has output nodes for a-helix and non-helix, thus predicting helical propensity. The other network has output nodes for b-strand and non-beta strand, thus predicting beta propensity.
The input layer of single and dual networks contains an additional variable node for the query sequence. This variable node was set to the computed energy term (>(i,i±4) energy) of the query sequence for single network architecture. However, for dual network architecture, the a-helix- predicting network uses the (i,i±4) energy term of the query sequence as the additional input, while the beta-predicting network uses >(i,i±4) energy term as the additional input.
During operation, the additional
input typically will span the range of scaled energy values
from -2.0 to 2.0. Because individual amino acids differ with respect to
side-chain length, composition, and hydrophobicity, the interaction
energy of an input sequence is pre-normalized by the average interaction
energy of the corresponding amino acids (See details in our publications).
Traning set: 440,884 fragments of 7-residue length from SCOP20
Single neural network : 74%
(Find details in Yoon et al., Comp Biol & Chem, 2007)
NetCSSP calculates P(helix), P(beta) and P(coil) for each residue of a given sequence
Test set 1 includes 104 fragments
Top figure: The CSSP profile of human IAPP4-34 sequence
(Find details in Yoon & Welsh, Protein Science, 2004 and Proteins, 2005)
Experimental data sets were retrieved from Chiti et al., Nat Struc Biol, 2002
Data from Yoon & Welsh, 2004, Protein Science
Data from Yoon & Welsh, 2005, Poteins