The problem of protein folding has always been of prime concern in molecular
biology. Under normal physiological conditions, most proteins acquire well defined
compact three dimensional shapes, known as the native conformations, at which
they are biologically active. When proteins are unfolding or misfolding, they
not only lose their inherent biological activity but they can also aggregate into
insoluble fibrils structures called amyloids which are known to be involved in
many degenerative diseases like Alzheimer’s disease, Parkinson’s disease, type
2 diabetes, cerebral palsy, mad cow disease etc. Thus, determining the folded
structure and clarifying the mechanism of folding of the protein plays an important
role in our understanding of the living organism as well as the human health.
Protein aggregation and amyloid formation have also been studied extensively
in recent years. Studies have led to the hypothesis that amyloid is the general
state of all proteins and is the fundamental state of the system when proteins
can form intermolecular interactions. Thus, the tendency for aggregation and formation amyloid persists for all proteins and is a trend towards competition with
protein folding. However, experiments have also shown that possibility of aggregation and aggregation rates depend on solvent conditions and on the amino acid
sequence of proteins. Some studies have shown that small amino acid sequences
in the protein chain may have a significant effect on the aggregation ability. As
a result, knowledge about the link between amino acid sequence and possibility
of aggregation is essential for understanding amyloid-related diseases as well as
finding a way to treat them.
Although all-atom simulations are now widely used molecular biology, the
application of these methods in the study of protein folding problem is not feasible
due to the limits of computer speed. A suitable approach to the protein folding
problem is to use simple theoretical models. There are quite a number of models
with different ideas and levels of simplicity, but most notably the Go model and
the HP network model and tube model.
33 trang |
Chia sẻ: thientruc20 | Lượt xem: 470 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu The role of hydrophobic and polar sequence on folding mechanisms of proteins and aggregation of peptides, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
MINISTRY OF EDUCATION VIETNAM ACADEMY
AND TRAINING OF SCIENCE AND TECHNOLOGY
GRADUATE UNIVERSITY SCIENCE AND TECHNOLOGY
———————
NGUYEN BA HUNG
THE ROLE OF HYDROPHOBIC AND POLAR SEQUENCE
ON FOLDING MECHANISMS OF PROTEINS AND
AGGREGATION OF PEPTIDES
Major: Theoretical and computational physics
Code: 9 44 01 03
SUMMARY OF PHYSICS DOCTORAL THESIS
HANOI − 2018
INTRODUCTION
The problem of protein folding has always been of prime concern in molecular
biology. Under normal physiological conditions, most proteins acquire well defined
compact three dimensional shapes, known as the native conformations, at which
they are biologically active. When proteins are unfolding or misfolding, they
not only lose their inherent biological activity but they can also aggregate into
insoluble fibrils structures called amyloids which are known to be involved in
many degenerative diseases like Alzheimer’s disease, Parkinson’s disease, type
2 diabetes, cerebral palsy, mad cow disease etc. Thus, determining the folded
structure and clarifying the mechanism of folding of the protein plays an important
role in our understanding of the living organism as well as the human health.
Protein aggregation and amyloid formation have also been studied extensively
in recent years. Studies have led to the hypothesis that amyloid is the general
state of all proteins and is the fundamental state of the system when proteins
can form intermolecular interactions. Thus, the tendency for aggregation and for-
mation amyloid persists for all proteins and is a trend towards competition with
protein folding. However, experiments have also shown that possibility of aggre-
gation and aggregation rates depend on solvent conditions and on the amino acid
sequence of proteins. Some studies have shown that small amino acid sequences
in the protein chain may have a significant effect on the aggregation ability. As
a result, knowledge about the link between amino acid sequence and possibility
of aggregation is essential for understanding amyloid-related diseases as well as
finding a way to treat them.
Although all-atom simulations are now widely used molecular biology, the
application of these methods in the study of protein folding problem is not feasible
due to the limits of computer speed. A suitable approach to the protein folding
problem is to use simple theoretical models. There are quite a number of models
with different ideas and levels of simplicity, but most notably the Go model and
the HP network model and tube model.
Considerations of tubular polymer suggest that tubular symmetry is a fun-
damental feature of protein molecules which forms the secondary structures of
proteins (α and β). Base on this idea, the tube model for the protein was de-
veloped by Hoang and Maritan’s team and proposed in 2004. The results of the
tube model suggest that this is a simple model and can describes well many of the
basic features of protein. The tube model is also the only current model that can
simultaneously be used for the study of both folding and aggregation processes.
1
In this thesis, we use a tube model to study the role of hydrophobic and
polar sequence on folding mechanism of proteins and aggregation of peptides.
Spatial fill of the tubular polymer and hydrogen bonds in the model play the
role of background interactions and are independent of the amino acid sequence.
The amino acid sequence we consider in the simplified model consists of two
types of amino acids, hydrophobic (H) and polar (P). To study the effect of HP
sequence on the folding process, we will compare the folding properties of the
tube model using the hydrophobic interaction (HP tube model) with tube model
using the pairing interaction which is similar to the Go model (Go tube model).
This comparison helps to clarify the role of non-native interactions in non-native
interactions. To study the role of the HP sequence on aggregation of protein, we
will compare the possibility of aggregation of peptide sequences with different HP
sequences including the consideration of the shape of the aggregation structures
and the properties of aggregation transition phase. In addition, in the study of
protein aggregation, we propose an improved model for hydrophobic interaction
in the tube model by taking into account the orientation of the side chains of
hydrophobic amino acids. Our research shows that this improved model allows
for obtaining highly ordered, long-chain aggregation structures like amyloid fibrils.
1. The objectives of the thesis:
The aim of the studies is to gain fundamental understanding of the role of
hydrophobic and polar sequence on folding mechanism of proteins and aggre-
gation of peptides
2. The main contents of the thesis:
The general understanding of protein and protein folding, protein aggregation
is introduced in chapters 1, 2 of this thesis. Chapter 3 presents the methods
used to simulate and analyze the data. The obtained results of role of HP
sequence for protein folding are presented in chapter 4. The results of role of
HP sequence for protein aggregation are presented in chapter 5.
2
Chapter 1
Protein folding
1.1 Structural properties of proteins
Proteins are macromolecules that are synthesized in the cell and responsible
for the most basic and important aspects of life. Proteins are polymers (polypep-
tides) formed from sequences of 20 diffirent types of amino acids, the monomers
of the polymer. The amino acids in the protein differ only in their side chains
and are linked together through peptide bonds that form a linear sequence in a
particular order.
Under normal physiological conditions, most proteins acquire well defined
compact three dimensional shapes, knows as the native conformations, at which
they are biologically active.
The amino acid sequence in the protein determines the structure and function
of the protein. Proteins has four types of structure.
Primary structure: It is just the chemical sequence of amino acids along the
backbone of the protein. These amino acid in chain linked together by peptide
bonds.
Secondary structure is the spatial arrangement of amino acids. There are two
such types of structures: the α-helices and the β-sheets. This kind of structure
which maximize the number of hydrogen bonds (H-bonds) between the CO and
the NH groups of the backbone.
Tertiary structure: A compact packing of the secondary structures comprises
tertiary structures. Usually, theses are the full three dimensional structures of
proteins. Tertiary structures of large proteins are usually composed of several
domains.
Quaternary structure: Some proteins are composed of more than one polypep-
tide chain. The polypeptide chains may have identical or different amino acid
sequences depending on the protein. Each peptide is called a subunit and has its
own tertiary structure. The spatial arrangement of these subunits in the protein
is called quaternary structure
There are a number of semi-empirical interactions that are introduced by
chemists and physicists to describe interactions in proteins: disulfide bridges,
3
Coulomb interactions, Hydrogen bonds, Van der Waals interactions, Hydrophobic
interactions.
1.2 Protein folding phenomenon
Once translated by a ribosome, each polypeptide folds into its characteristic
three-dimensional structure from a random coil. Since the fold is maintained by a
network of interactions between amino acids in the polypeptide, the native state
of the protein chain is determined by the amino acid sequence (hypothesis of
thermodynamics).
1.3 Paradox of Levinthal
Levinthal paradox which addresses the question: how can proteins possibly
find their native state if the number of possible conformations of a polypeptide
chain is astronomically large?
1.4 Folding funnel
Based on theoretical and empirical research findings, Onuchic and his col-
leagues have come up with the idea of the folding funnel as depicted in Figure
1.1. The folding process of the protein in the funnel is the simultaneous reduc-
tion of both energy and entropy. As the protein begins to fold, the free energy
decreases and the number of configurations decreases (characterized by reduced
well width).
N
folding
entropy
g
en
er
gy
Figure 1.1: The diagram sketches of funnel describes the protein folding energy lanscape
4
Figure 1.2: Free energy lanscape in the two-state model. In this model, ∆F is the diference between the free
energy of the folded and unfolded states. ∆FN and , ∆FD, ∆F are the height of barrier from the unfolded and
folded states and free energy difference between the N and U states , respectively
In the canonical depiction of the folding funnel, the depth of the well repre-
sents the energetic stabilization of the native state versus the denatured state, and
the width of the well represents the conformational entropy of the system. The
surface outside the well is shown as relatively flat to represent the heterogeneity
of the random coil state.
1.5 The minimum frustration principle
The minimum frustration principle was introduced in 1989 by Bryngelson
and Wolynes based on spin glass theory. This principle holds that the amino acid
sequence of proteins in nature is optimized through natural selection so that the
frustrated caused by interaction in the natural state is minimal.
1.6 Two-state model for protein folding
Experimental observations suggest that the two-state model is a common
mechanism used to characterize folding dynamics of the majority of small, globuar
proteins. In a two-state model of protein folding, the single domain protein can
occupy only one of two states: the unfolded state (U) or the folded state (N).
The free energy diagram for two-state model is characterized by a large barrier
separating the folded state and the unfolded state corresponding minima of the
free energy of a reaction coordinate. The free energy difference between the N
and U states (∆F ) characterize the degree of stability of the folding state called
folding free energy. Rates of folding kf and unfolding ku obey the law Vant Hoff-
5
Arrhennius:
kf,u = ν0 exp
(
−∆FN,D
kBT
)
(1.1)
For ν0 is constant, T is the temperature and kB is the Boltzmann constant.
The change of such as temperature, pressure, and concentration may affect on the
∆F .
1.7 Cooperativity of protein folding
Cooperativity is a phenomenon displayed by systems involving identical or
near-identical elements, which act dependently of each other. The folding of
proteins is cooperative process. In the protein, cooperativity is applied to the two-
state process and is understood as the sharpness of thermodynamic transitions.
In practice, cooperativity is determined by the parameter measured by the ratio
between the enthalpy van’t Hoff and the thermal enthalpy.
κ2 = ∆HvH/∆Hcal (1.2)
High cooperativity means that the system satisfies the two-state standard and
κ2 is closer to 1, the higher the co-operation and vice versa.
1.8 Hydrophobic interaction
The hydrophobic effect is the observed tendency of nonpolar substances (such
as oil, fat) to aggregate in an aqueous solution and exclude water molecule. The
tendency of nonpolar molecules in a polar solvent (usually water) to interact with
one another is called the hydrophobic effect. In the case of protein folding, the
hydrophobic effect is important to understanding the structure of proteins. The
hydrophobic effect is considered to be the major driving force for the folding of
globular proteins. It results in the burial of the hydrophobic residues in the core
of the protein.
1.9 HP lattice model
In the HP lattice model, there are two types of amino acids with respect to
their hydrophobicity: polar (P), which tend to be exposed to the solvent on the
protein surface, and hydrophobic (H), which tend to be buried inside the globule
6
protein. The folding of the protein is defined as a random step in a 2D or 3D
network. Using this model, Dill had design some HP sequence that the minimal
energy state in the tight packet configurations was unique. The phase transition
of the sequences is designed to be well cooperative. Research shows that aggregate
due to hydrophobic interaction is the main driving force for folding.
1.10 Go model
The Go model ignores the specificity of amino acid sequences in the protein
chain and interaction potential is build based on the structure of the folded state.
The basis of the Go model is the maximum consistent principle of protein interac-
tions in the folded state. The results of the study show that the Go model for the
folding mechanism is quite good with the experiment, especially in determining
the contribution of amino acid positions in the polypeptide chain to the transi-
tion state during protein folding. . Because the model is based on a native state
structure, the Go model can not predict the protein structure from the amino
acid sequence that is only used to study the folding process of a known structure.
1.11 Tube model
Considerations of symmetry and geometry lead to a description of the pro-
tein backbone as a thick polymer or a tube. At low temperatures, a homopoly-
mer model as a short tube exhibits two conventional phases: a swollen essen-
tially featureless phase and and a conventional compact phase, along with a novel
marginally compact phase in between with relatively few optimal structures made
up of α-helices and β-sheets. The tube model predicts the existence of a fixed
menu of folds determined by geometry, clarifies the role of the amino acid se-
quence in selecting the native-state structure from this menu, and explains the
propensity for amyloid formation.
7
Chapter 2
Amyloid Formation
2.1 The structure of amyloid fibril
(a) (b)
Figure 2.1: 3D structure of the Alzheimer’s amyloid-β (1-42)fibrils has a PDB code of 2BEG (a) view along the
direction of fibril axis (b) view perpendicular to the direction of fibril axis
Amyloid fibrils possess a cross-β structure, in which β-strands are oriented
perpendicularly to the fibril axis and are assembled into β-sheets that run the
length of the fibrils (Figure 2.1). They generally comprise 24 protofilaments, that
often twist around each other. Repeated interactions between hydrophobic and
polar groups run along the fibril axis.
2.2 Mechanism of amyloid aggregation
The formation of amyloid can be considered to involve at least three steps
and are generally referred to as lag phase, growth phase (or elongation) phase
and an equilibration phase. Seeding involves the addition of a preformed fibrils to
a monomer solution thus increasing the rate of conversion to amyloid fibrils. Ad-
dition of seeds decreases the lag phase by eliminating the slow nucleation phase.
8
Chapter 3
Methods and Models for simulations
3.1 HP tube model
The backbone of the protein is models as a string of Cα atoms separated by
an interval of 3.8A˚, forming a flexible tube of 2.5A˚ also has a constraint with both
the tube’s three radii (local and non-local). Potential 3 objects describing this
condition are given in figure 3.1)
Vtube(i, j, k) =
{
∞ if Rijk < ∆
0 if Rijk ≥ ∆ ∀ i, j, k (3.1)
The bending potential in the tube model is related to the spatial constraints of
the polypeptide chain. The bending potential at position i given by (Figure 3.1)
Vbend(i) =
∞ if Ri−1,i,i+1 < ∆
eR if ∆ ≤ Ri−1,i,i+1 < 3.2 A˚
0 if Ri−1,i,i+1 ≥ 3.2 A˚ .
(3.2)
eR = 0.3 > 0 and the unit corresponds to the energy of a local hydrogen
bond In the tube model, local hydrogen bonds are made up of atoms i and i+3 and
assigned to energy equal to −. Non-local hydrogen bonds are formed between the
atoms i and j > i + 4 and have the energy of −0.7 . The energy and geometric
constraints of a local hydrogen bond between the atom i and the atom j are
defined as follows:
j = i+ 3
ehbond = −
4.7 A˚ ≤ rij ≤ 5.6 A˚
|~bi ·~bj| > 0.8
|~bj · ~cij| > 0.94
|~bi · ~cij| > 0.94
(ri,i+1 × ri+1,i+2) · ri+2,i+3 > 0 .
(3.3)
The same for a non-local hydrogen bond:
9
Non local radius
of curvature
Hydrophobic
interaction
Local radius
of curvature
݁ோ ݁ௐ
ݎ௫ ݎ
ݎ ݕ
ݕ
ݖ
ݖ
Figure 3.1: Sketch of the potentials used in the tube model of the protein. r, y are the local radius of curvature,
nonlocal radius of curvature; z is distance between two amino acid residues; eR and eW are beding energy and
hydrophobic energy
j > i+ 4
ehbond = −0.7
4.1 A˚ ≤ rij ≤ 5.3 A˚
|~bi ·~bj| > 0.8
|~bj · ~cij| > 0.94
|~bi · ~cij| > 0.94 .
(3.4)
In the tube model, hydrophobic interactions are introduced in the form of paring
potential between non-continuous Cα atoms in sequence (j > i+ 1) given by
Vhydrophobic(i, j) =
{
eW rij ≤ 7.5 A˚
0 rij > 7.5 A˚ ,
(3.5)
eW denotes the hydrophobic interaction energy for each contact, depending
on the hydrophobicity of the amino acids i and j. In the most studies, these
values were selected by eHH = −0.5 , eHP = ePP = 0.
3.2 Go tube model
The Go tube model is a tube model in which hydrophobic interaction energy
is replaced by the same energy interaction as the Go-like interaction model:
E = Ebend + Ehbond + EGo . (3.6)
Thus, the Go tube model retains the geometric and symmetric properties, the
10
bending energy and hydrogen bonds as in tube model. Go-type energy is built on
the structure of the given native state. Interactive Go is given by:
VGo(i, j) =
{
Cij eW rij ≤ 7.5 A˚
0 rij > 7.5 A˚ ,
(3.7)
where Cij are the elements of the native contact map. Cij = 1 if between i
and j exist in the native state and Cij = 0 in the other case. An contact in the
native state is defined when the distance between two consecutive Cα atoms is
less than 7.5 A˚.
3.3 Tube Model with correlated side chain orientations
we apply an additional constraint on the hydrophobic contact by taking into
account the side chain orientation: ni · cij < 0.5 and −ni · cij < 0.5. Where ni
and nj are the normal vectors of the Frenet frames associated with bead i and
j, respectively, cij is an unit vector pointing from bead i to bead j. The new
constraint is in accordance with the statistics drawn from an analysis of PDB
structures
3.4 Structural protein parameters
To study the protein folding to the native state, we examine the properties
of the protein configurations obtained from the simulation through a number
of characteristic features including folding contacts, root mean square deviation
(rmsd) and radius of gyration (Rg) .
3.5 Monte Carlo simulation method
For studying the folding and aggregation of protein, we carry out multiple in-
dependent Monte Carlo (MC) simulations with Metropolis algorithm. The trans-
fer of states of the systems in the models used is made by pivot, crank-shaft
and tranlocation motion for protein aggregation and pivot, crank-shaft motion
for protein folding.
3.6 Parallel tempering
Parallel tempering , also known as replica exchange MCMC sampling, is a
simulation method aimed at improving the dynamic properties of Monte Carlo
11
method simulations of physical systems, and of Markov chain Monte Carlo (MCMC)
sampling methods more generally by exchanges configurations at different tem-
peratures.
Using Metropolis algorithm to swap two configurations
kBA = min {1, exp [(βi − βj) (Ei − Ej)]} (3.8)
For kBA is the probability of moving from A to B. This method is very
effective to find the basic state simultaneously at each temperature still obtained
balanced set and they are easily applied on parallel computers.
3.7 The weighted histogram analysis method
The Weighted Histogram Analysis Method (WHAM) allows for optimal anal-
ysis of data obtained from MC simulations as well as other simulations over a
wide range of parameter