.. role:: todo .. _theor-sop: Self-Organized Polymer model ============================ Method ------ In the structure-based Self-Organized Polymer (SOP) model, each amino acid residue is usually represented by a single interaction center described by the corresponding :math:`C_\alpha`-atom, or two interaction centers described by the corresponding :math:`C_\alpha` and :math:`C_\beta` atoms. In the first case the protein backbone is represented by a collection of the :math:`C_\alpha-C_\alpha` covalent bonds. In the second case, backbone atoms are replaced by :math:`C_\alpha` bead and side-chain atoms are replaced by one :math:`C_\beta` bead, connected covalently to the :math:`C_\alpha` bead of the same amino acid. The coarse-graining procedure with one interaction center representating each residue is illustrated on Figure 1. The potential energy function of a protein conformation :math:`U_{SOP}` is specified in terms of the coordinates of the :math:`C_\alpha` and :math:`C_\beta`-beads :math:`\{r_i\} = r_1, r_2,\dots, r_N`, where :math:`N` is the total number of beads in coarse-grained model. :math:`U_{SOP}` is given by [Hyeon2006]_, [Mickler2007]_: .. math:: U_{SOP} = U_{FENE} + U_{NB}^{ATT} + U_{NB}^{REP}. :label: usop Eq. :eq:`usop`, the first term is the finite extensible nonlinear elastic (FENE) potential: .. math:: U_{FENE}=-\sum_{covalent}\frac{k R_0}{2}\log \left(1-\frac{(r_{ij}-r_{ij}^0)^2}{R_0^2}\right). :label: ufene Here, :math:`k=14` N/m is the spring constant, and :math:`R_0=2.0` Å is the tolerance to the change of the covalent bond length. The FENE potential describes the backbone chain connectivity (:math:`C_\alpha-C_\alpha`), side-chain connectivity (:math:`C_\alpha-C_\beta`) and other covalent links (e.g. disulfide bonds). The distance between the particles (:math:`C_\alpha-C_\alpha` or :math:`C_\alpha-C_\beta`) :math:`i` and :math:`j`, is :math:`r_{ij}`, and :math:`r^0_{ij}` is its value in the native structure. The summation (:math:`\Sigma_{bonds}`) is performed over all pairs of beads :math:`i` and :math:`j` that are covalently bonded. To account for the non-covalent (non-bonded) interactions that stabilize the native state, the Lennard-Jones potential is used: .. math:: U_{NB}^{ATT} = \sum_{native}\varepsilon_h\left[ \left( \frac{r_{ij}^0}{r_{ij}} \right)^{12} - 2\left( \frac{r_{ij}^0}{r_{ij}} \right)^{6} \right]. :label: uatt Here, the :math:`r_{ij}` is the distance between two beads :math:`i` and :math:`j` and :math:`r^0_{ij}` is the equilibrium distance taken from the initial structure. The value of :math:`\varepsilon_h` quantifies the strength of the non-bonded interactions. The summation :math:`\sum_{native}` goes over all beads :math:`i` and :math:`j` that are assumed forming native contacts. The definition of the native contact can vary, the most general definition follows. If the two be beads :math:`i` and :math:`j` are separated by more than 2 covalent bonds and :math:`r_{ij}`_ [Box1958]_. In the one-RNG-per-thread approach, the basic idea is to partition a single sequence of random numbers among many computational threads running concurrently across an entire GPU device, each producing a stream of random numbers. Since most RNG algorithms, including LCG, Ran2, and Hybrid Taus, are based on sequential transformations of the current state [Press1994]_, then the most common way of partitioning the sequence is to provide each thread with different seeds while also separating the threads along the sequence so as to avoid possible inter-stream correlations (see Figure 4, left panel). On the other hand, several generators, including the `Mersenne Twister `_ and `Lagged Fibonacci `_ algorithms, which employ recursive transformations, allow one to leap ahead in a sequence of random variates and to produce the :math:`(n+1)`-st random number without knowing the previous, :math:`n`-th number [Mascagni2004]_. The leap size, which, in general, depends on a choice of parameters for an RNG, can be properly adjusted to the number of threads (number of particles :math:`N`), or multiples of :math:`N` :math:`(M \times N)`. Then, all :math:`N` random numbers can be obtained simultaneously, i.e. the :math:`j`-th thread produces numbers :math:`j, j+N, j+2N,...,` etc. :math:`(j=1,2,...,n)`. At the end of each simulation step, threads of execition must be syncronized to update the current RNG state. Hence, the same RNG state can be shared by all threads, each updating just one elements of the state. We refer to this as the one-RNG-for-all-threads approach (Figure 4, right panel). .. figure:: rng.png :align: center :figwidth: 70% **Figure 4:** A simplified schematic of the one-RNG-per-thread approach (*left panel*) and the one-RNG-for-all-threads approach (*right panel*). In the one-RNG-per-thread approach, one RNG produces a stream of pseudo-random numbers in each :math:`j`-th thread of execution :math:`(j=1,2,...,n)`, i.e., the same RNG algorithm (realized in many RNGs) is running in each thread generating different subsequences of the same sequence of random numbers. The one-RNG-for-all-threads approach builds on the ability of different threads to communicate, and, hence, to share the state of just one RNG across an entire GPU device. We employed these methods to develop GPU-based implementations of the Linear Congruent Generator (LCG) [Press1992]_, and the Ran2 [Press1992]_, Hybrid Taus [Press1992]_, [Tausworthe1965]_, and additive Lagged Fibonacci algoritms [Press1992]_, [[Mascagni2004]_. These generators have been incorporated into the program for Langevin simulations of biomolecules fully implemented on the GPU. .. _rng-bench: Benchmark simulations --------------------- We tested RNGs implemented on a GPU in Langevin simulations of :math:`N` Brownian oscillators using the Hybrid Taus and additive Lagged Fibonacci algorithms. We compared the computational time as a function of the system size :math:`N` for three different implementations of Langevin simulations: - random numbers and Langevin Dynamics are generated on a CPU; - random numbers, obtained on the CPU, are transfered to the GPU and used to generate Langevin Dynamics on the GPU; - random numbers and Langevin Dynamics are generated on the GPU. The results obtained for the 2.83 GHz Intel Core i7 930 CPU, for the 1.15GHz Tesla C2050 (MIPT) show that starting from :math:`\approx10^2` particles, it becomes computationally expensive to generate random numbers on the CPU and transfer them to the GPU in order to generate stochastic trajectories on the GPU (Figure 3, left panel). We observed a ~10-250-fold speedup for Langevin simulations of :math:`N=10^3-10^6` Brownian particles on the GPU (Figure 5, right panel). .. figure:: rng-benchmark1.png :align: center :figwidth: 80% **Figure 5:** *Left panel:* The computational time for Langevin Dynamics (LD) of :math:`N` Brownian oscillators with the Hybrid Taus and additive Lagged Fibonacci RNGs. Considered are three implementations, where random numbers and LD are generated on the CPU (Hybrid Taus (CPU) + Dynamics (CPU)), random numbers are obtained on the CPU, transfered to the GPU and used to propagate LD on the GPU (Hybrid Taus (CPU) + Dynamics (GPU)), and random numbers and LD are generated on the GPU (Hybrid Taus (GPU) + Dynamics (GPU) and Lagged Fibonacci (GPU) + Dynamics (GPU)). *Right panel:* The computational speedup (CPU time/GPU time) for LD simulations fully implemented on the GPU and on the single CPU core. Compared are two options when an RNG (Hybrid Taus or Lagged Fibonacci) is organized in a separate kernel or is inside the main (integration) kernel. We also benchmarked the computational efficiency of the GPU-based realizations of the Ran2, Hybrid Taus, and Lagged Fibonacci algorithms using Langevin simulations of :math:`N` Brownian oscillators in three dimensions. For each system size :math:`N`, we ran one trajectory for :math:`10^6` simulation steps. All :math:`N` threads were synchronized at the end of each step to emulate an LD simulation run of a biomolecule on a GPU. The associated execution time and memory usage are profiled in Figure 6 below. .. figure:: rng-benchmark2.png :align: center :figwidth: 80% **Figure 6:** The computational performance of LCG, and the Ran2, Hybrid Taus, and Lagged Fibonacci algorithms in Langevin simulations of :math:`N` Brownian oscillators on the GPU device. *Left panel:* The execution time (CPU time for Langevin simulations with Ran2 and Hybrid Taus RNGs is shown for comparison). *Right panel:* The memory demand, i.e. the amount of memory needed for an RNG to store its current state. Step-wise increases in the memory usage for Lagged Fibonacci are due to the change of constant parameters for this RNG. On a GPU Ran2 is the most demanding generator as compared to the Hybrid Taus, and Lagged Fibonacci RNGs (Figure 6, left panel). Using Ran2 in Langevin simulations to obtain a single trajectory over :math:`10^9` steps for a system of :math:`N=10^4` particles requires additional ~264 hours of wall-clock time. The associated memory demand for Ran2 RNG is quite high, i.e. >250MB for :math:`N=10^6` (Figure 6, right panel). Because in biomolecular simulations a large memory area is needed to store parameters of the force field, Verlet lists, interparticle distances, etc., the high memory demand might prevent one from using Ran2 in the simulations of a large system. Also, implementing the Ran2 RNG in Langevin simulations on the GPU does not lead to a substantial speedup (Figure 6, left panel). By contrast, the Hybrid Taus and Lagged Fibonacci RNGs are both light and fast in terms of the memory usage and the execution time (Figure 6). These generators require a small amount of memory, i.e. <15-20MB, even for a large system of as many as :math:`N=10^6` particles. .. _use-sop: Using SOP-GPU program ===================== Running SOP-GPU program requires specification of a configuration file (regular text file), containing information about the system of interest and parameters of the simulation:: sop-gpu config_file.conf All the information about the simulation protocol and current process is printed out in terminal screen as well as in separate files specified in configuration file. There are six regimes of simulation available in SOP-GPU package: minimization simulation, equilibrium simulation, point-/plane-pulling simulation, force indentation and heating simulation. Also, SOP-GPU package has implemented hydrodynamic interactions, which can by optionally included in calculation. Parameters and output files for each of these regimes are described in sections below. .. _gen-out: General output -------------- The general output files for any regime of simulation are following: - Energy output file (usual format *.dat*). - Trajectory coordinates file (format *.dcd*). - Restart coordinates file (format *.pdb*). - Reference coordinates file (first frame of the trajectory, format *.pdb*). - Final coordinates file (format *.pdb*). The columns of standard energy output file are: 1. Current simulation step. 2. Average Maxwell-Boltzmann temperature (:math:`T`, in kcal/mol). 3. Potential energy of covalent bonds (:math:`U_{FENE}`, in kcal/mol). 4. Potential energy of native interactions (:math:`U_{NB}^{ATT}`, in kcal/mol). 5. Potential energy of repulsive (long range) interactions (:math:`U_{NB}^{REP}`, in kcal/mol). 6. Number of native contacts not ruptured (:math:`Q`). 7. Total potential energy (:math:`U_{SOP}`, in kcal/mol). 8. Gyration radius (:math:`R_{gyr}`, optional). 9. Deviation of hydrodynamic tensor from diagonal form (:math:`\epsilon` (see Eq. :eq:`tea-beta-prime`, optional). .. _theor-hd: Hydrodynamic interactions ------------------------- In Langevin Dynamics simulations in the overdamped limit, equations of motion for particles of the system are propagated forward in time (see Eq. :eq:`ld` and Eq. :eq:`drnum` below) with the amplitude of random force :math:`\rho=\sqrt{2k_BT\zeta/h}=k_BT \sqrt{2/D_{\alpha\alpha}h}`, where :math:`\alpha` runs over all degrees of freedom. In this approach, which ignores the hydrodynamic coupling of degrees of freedom, all particles are described by the same diffusion coefficient :math:`D=D_{\alpha\alpha}=k_BT/\zeta`. To account for solvent-mediated many-body effects, one can use an approach proposed originally by Ermak and McCammon [Ermak1978]_ . In this approach, the equation of motion :eq:`drnum` is transformed (in absence of external flow) into the following equation: .. math:: \Delta r_\alpha = \sum_{\beta=1}^{3N} {\frac{D_{\alpha\beta}}{kT} F_\beta h} + \sqrt{2h} \sum_{\beta=1}^{3N} {B_{\alpha\beta} g_\beta} :label: ermak-dr The first term on the right-hand side is a hydrodynamic tensor :math:`\mathbf{D}` --- a real :math:`3N\times3N` matrix, in which an entry :math:`D_{\alpha\beta}` is a contribution to the diffusion of :math:`\alpha`-th degree of freedom from the :math:`\beta`-th degree of freedom. Alternatively, tensor :math:`\mathbf{D}` can be represented by an :math:`N\times N` matrix of :math:`3\times 3` submatrices :math:`\mathbf{D}_{ij}`, each corresponding to a pair of particles :math:`i` and :math:`j`. Also, for the correct distribution of random forces, in the second term in equation :eq:`ermak-dr` a real :math:`3N\times3N` matrix :math:`\mathbf{B}` must satisfy the condition :math:`\mathbf{B}^\intercal \mathbf{B}=\mathbf{D}`, where the superscript :math:`{}^\intercal` represents the transpose of a matrix. It is easy to show that when in equation :eq:`ermak-dr` :math:`\mathbf{D}` is a diagonal matrix with the identical matrix elements :math:`D_{\alpha\alpha}=kT/\zeta`, we recover equation :eq:`ermak-dr`. In SOP-GPU program, we use the Rotne-Prager-Yamakawa (RPY) form of the hydrodynamic tensor :math:`\mathbf{D}` [Rotne1969]_ [Yamakawa1970]_, which is a positive-definite quantity. The submatrices :math:`\mathbf{D}_{ij}` of RPY tensor are given by the following expressions: .. math:: \mathbf{D}_{ij} = \frac{kT}{\zeta} \begin{cases} \mathbf{I} & \text{, if } i=j\text{,} \\ \left( 1 - \frac{9\left|\mathbf{r}_{ij}\right|}{32 a} \right) \mathbf{I} + \left( \frac {3\left|\mathbf{r}_{ij}\right|}{32a} \right) \mathbf{\hat{r}}_{ij} \times \mathbf{\hat{r}}_{ij} & \text{, if } i \neq j \text{ and } \left|\mathbf{r}_{ij}\right| < 2a_{HD}\text{,} \\ \left( 1 + \frac{2a^2}{3\left|\mathbf{r}_{ij}\right|^2} \right) \mathbf{I} + \left( 1 - \frac{2a^2}{\left|\mathbf{r}_{ij}\right|^2} \right) \mathbf{\hat{r}}_{ij} \times \mathbf{\hat{r}}_{ij} & \text{, if } i \neq j \text{ and } \left|\mathbf{r}_{ij}\right| \ge 2a_{HD }\text{.} \end{cases} :label: rpy In equation :eq:`rpy`, :math:`\mathbf{I}` is the identity matrix of rank 3, :math:`a_{HD}` is the hydrodynamic radius of the particle (we assume that :math:`a_{HD}` is same for all particles, the denotation ":math:`\times`" is used to define the tensor product. In SOP-GPU program, we utilized an exact approach of computing :math:`\mathbf{B}` using Cholesky decomposition of :math:`\mathbf{D}`, as well as fast Truncated Expansion approximation (TEA) approach [Geyer2009]_. In the TEA-based approach, the matrix elements of :math:`\mathbf{B}` can be rewritten as :math:`B_{\alpha\beta}=C_\alpha b_{\alpha\beta} D_{\alpha\beta}`, and equation :eq:`ermak-dr` can be recast as .. math:: \Delta r_\alpha = \frac{h}{\zeta} \sum_{\beta=1}^{3N} \frac{D_{\alpha\beta}}{D_{\alpha\alpha}} \left( F_\beta + C_\alpha b_{\alpha\beta} \cdot \rho g_\beta \right) \text{,} :label: tea-dr where .. math:: b_{\alpha\beta} = \begin{cases} 1 & \text{ if } \alpha = \beta, \\ b' & \text{ if } \alpha \neq \beta. \end{cases} :label: tea-beta In Eqs. :eq:`tea-dr` and :eq:`tea-beta`, :math:`C_\alpha` and :math:`b'` are given by .. math:: C_\alpha = \left( 1 + \sum_{\beta \neq \alpha} {b'^2 \frac{D_{\alpha\beta}}{D_{\alpha\alpha}D_{\beta\beta}}} \right)^{\frac{1}{2}} \text{,} :label: tea-ci .. math:: b' = \frac{1-\sqrt{1-[(N-1)\epsilon^2-(N-2)\epsilon]}}{\sqrt{(N-1)\epsilon^2-(N-2)\epsilon}}, :label: tea-beta-prime where :math:`\epsilon=\langle D_{\alpha\beta}/D_{\alpha\alpha}\rangle`. This linearization procedure allows us to efficiently parallelize the integration algorithm on a GPU. Cholesky algorithm is implemented by-the-book, i.e. straightforward computation of lower-left-triangular matrix :math:`B` is carried out with :math:`O(N^3)` complexity. Due to implementation design, the single trajectory can not contain more than 128 particles is Cholesky factorization is applied. There is no agreement regarding the value of the hydrodynamic radius :math:`a_{HD}`. The proposed values vary between :math:`a_{HD}=1.5-5.3` Å [Cieplak2009]_ [Frembgen-Kesner2009]_. However, one must keep in mind that, although the TEA handles overlaps correctly, the RPY tensor is better suited for description of non-overlapping beads. Since the inter-bead :math:`C_{\alpha}-C_{\alpha}`-distance in a polypeptide chain is about :math:`3.8` Å, which corresponds to the length of a peptide bond, :math:`a_{HD}` should not exceed :math:`1.9` Å. For hydrodynamic interactions parameters see Section :ref:`par-hd`. .. _theor-pull: Pulling simulations ------------------- Pulling simulations were designed to mimic force-ramp and force-clamp AFM experiments. In this regime, cantilever base is represented by the virtual particle, connected by a harmonic spring to a specified ("pulled") amino acid, mimicking adsorption of residues on the cantilever tip. The system particles specified as "fixed" will be firmly constrained mimicking molecule absorption on the surface. The cantilever base moving with constant velocity (:math:`\nu_f`) extends the cantilever spring, translating into the molecule extension, with the time-dependent force (force-ramp) :math:`{\bf f}(t)=f(t){\bf n}` in the pulling direction :math:`{\bf n}`. The force magnitude, :math:`f(t)=r_f t`, applied to cantilever tip, i.e. to the pulled end of the molecule, increases linearly in time :math:`t` with the force-loading rate :math:`r_f=\kappa \nu_f` [Zhmurov2011]_. For pulling simulation parameters see Section :ref:`par-pull`. When pulling is enabled, program will save additional output file (usual format *.dat*) with pulling data. This file has following columns: 1. Current simulation step. 2. Absolute value of the end-to-end distance (:math:`X`, in Å). 3. Projection of the end-to-end distance on pulling vector (:math:`X_{proj}`, in Å). 4. Absolute value of the cantilever spring force (:math:`\kappa \Delta x`, in kcal/molÅ). 5. Force vector component (:math:`F_x`, in kcal/molÅ). 6. Force vector component (:math:`F_y`, in kcal/molÅ). 7. Force vector component (:math:`F_z`, in kcal/molÅ). .. _theor-ppull: Plane-pulling simulations ------------------------- .. _theor-indent: Force indentation simulations ----------------------------- Nanoindentation regime adds to the system a cantilever and surface models. In this regime, the cantilever base is represented by the virtual particle, connected to the spherical bead of radius :math:`R_{tip}`, mimicking the cantilever tip (indentor), by a harmonic spring. The tip interacts with the particles via the Lennard-Jones potential .. math:: U_{tip} = \sum_{i=1}^{N}{\varepsilon_{tip} \left [A_{tip}\left( \frac{\sigma_{tip}}{|r_i - r_{tip}| - R_{tip}} \right)^{12} + B_{tip} \left( \frac{\sigma_{tip}}{|r_i - r_{tip}| - R_{tip}} \right)^6 \right ]} :label: utip thereby producing an indentation on the particle's outer surface. In Eq. :eq:`utip`, :math:`r_i` and :math:`r_{tip}` are coordinates of the :math:`i`-th particle and the center of the tip, respectively, :math:`\varepsilon_{tip}` and :math:`\sigma_{tip}` are the parameters of interaction, and the summation is performed over all the particles under the tip. The factors :math:`A_{tip}` and :math:`B_{tip}` define the attractive and repulsive contributions into the indentor-particle interactions, respectively. For the standard Lennard-Jones potential :math:`A_{tip}=1` and :math:`B_{tip}=-2`. If :math:`A_{tip}=0` and :math:`B_{tip}=1` the interactions are repulsive only. For the cantilever tip, we solve numerically the following Langevin equation of motion: .. math:: \xi_{tip} \frac{dr_{tip}}{dt} = - \frac{\partial U_{tip}(r_{tip})}{\partial r_{tip}} + \kappa((r_{tip}^0 - \nu_f t) - r_{tip}) :label: ldtip where :math:`r_{tip}^0` is the initial position of spherical tip center (:math:`\nu_f` is the cantilever base velocity; :math:`\kappa` is the cantilever spring constant), and :math:`\xi_{tip}` is the friction coefficient for a spherical particle of radius :math:`R_{tip}` in water. To generate the dynamics of the biological particle of interest tested mechanically, the Eqs. :eq:`usop` --- :eq:`ld` for the particle (see above) and Eqs. :eq:`utip` and :eq:`ldtip` for the indentor (spherical tip) should be solved numerically. The substrate surface is also modeled using Lennard-Jones potential with parameters of interactions :math:`\varepsilon_{surf}` and :math:`\sigma_{surf}` and factors :math:`A_{surf}` and :math:`B_{surf}` (see Eq. :eq:`utip`). The surface is represented by a number of particles and interaction potential is calculated between each particle in system and particles on the surface. The cantilever base moving with constant velocity (:math:`\nu_f`) exerts (through the tip) the time-dependent force (force-ramp) :math:`{\bf f}(t)=f(t){\bf n}` in the direction :math:`{\bf n}` perpendicular to the particle surface. The force magnitude, :math:`f(t)=r_f t`, exerted on the particle increases linearly in time :math:`t` with the force-loading rate :math:`r_f=\kappa \nu_f` [Kononova2013b]_ [Kononova2014]_ . For force indentation simulation parameters see Section :ref:`par-indent`. The results of indentation will be saved in additional output file (usual format *.dat*) with the following columns: 1. Current simulation step. 2. Distance traveled by cantilever base (:math:`Z`, in Å). 3. Average molecular force acting on a cantilever tip projected onto chip movement direction (:math:`F_{proj}`, in kcal/molÅ). 4. Average absolute value of a molecular force, acting on a cantilever tip, (:math:`F`, in kcal/molÅ). 5. Absolute value of the cantilever spring force at a given step (:math:`\kappa\Delta x`, in kcal/molÅ). 6. Absolute value of the cantilever spring force average (:math:`\overline{\kappa\Delta x}`, in kcal/molÅ). 7. Molecular force vector component (:math:`F_x`, in kcal/molÅ). 8. Molecular force vector component (:math:`F_y`, in kcal/molÅ). 9. Molecular force vector component (:math:`F_z`, in kcal/molÅ). 10. Current cantilever tip coordinate (:math:`X_x`, in Å). 11. Current cantilever tip coordinate (:math:`X_y`, in Å). 12. Current cantilever tip coordinate (:math:`X_z`, in Å). 13. Current cantilever base coordinates (:math:`Z_x`, in Å). 14. Current cantilever base coordinates (:math:`Z_y`, in Å). 15. Current cantilever base coordinates (:math:`Z_z`, in Å). .. _theor-heat: Heating simulations ------------------- Although coarse-grained models are known to be not very accurate in describing heat-induced unfolding of molecules, SOP-model still can provide good qualitative results. When heating option is on, temperature of the water bath (i.e. strength of random force, see Eq. :eq:`drnum` below) increases gradually during the simulation process. Heating parameters are described in Section :ref:`par-heat`. .. _units: Units ===== For numerical evaluation of the Eq. :eq:`ld` in time, it can be written in form .. math:: \xi \frac{r_i^{t+1} - r_i^t}{\Delta t} = F_i^t + G_i^t :label: lnum When divide both sides of Eq. :eq:`lnum` by particle mass :math:`m` and express the change of coordinates :math:`\Delta r_i^t=r_i^{t+1} - r_i^t` arrive to .. math:: \Delta r_i^t = \frac{\Delta t}{\xi/m}\frac{1}{m}(F_i^t + G_i^t) From the equation for harmonic oscillator, :math:`\xi/m=\zeta/\tau_L` is damping coefficient. Here :math:`\zeta` is dimensionless damping ratio and :math:`\tau_L=\sqrt{m a^2/\varepsilon_h}` is characteristic time for underdamped motion of spherical particle of mass :math:`m` and radius :math:`a` with energy scale :math:`\varepsilon_h`. According to Langevin equation, the random force :math:`G_i^t=g_i^t\sqrt{2\zeta k_BT/h}`, where :math:`g_i^t` is random number from the interval :math:`[0,1]`. Hence .. math:: \Delta r_i^t = \frac{\Delta t \tau_L}{\zeta m}(F_i^t + g_i^t\sqrt{2\zeta k_BT/h}) :label: drnum From the Stokes-Einstein friction theory :math:`\xi=6 \pi \eta a` for a spherical particle of radius :math:`a` in a liquid with viscosity :math:`\eta`. Therefore :math:`\zeta = 6 \pi \eta a^2/\sqrt{m \varepsilon_h}`. In the program :math:`\zeta=50`. This was obtained for :math:`a \sim 5` Å, :math:`m \sim 3\times10^{-22}` g (mass of a residue) and the bulk water viscosity :math:`\eta=0.01` gs :math:`^{-1}` cm :math:`^{-1}`. In general, :math:`a` varies between :math:`3.8` Å to :math:`5` Å, while :math:`m` varies between :math:`3\times10^{-22}` g to :math:`5\times10^{-22}` g. In the simulations :math:`a=3.8` Å. Because of the fact that :math:`\zeta` depends on :math:`\varepsilon_h`, every time when :math:`\varepsilon_h` was changed, valid :math:`m` value should be calculated, which gives the value :math:`\zeta=50`. Example: for :math:`\varepsilon_h=1` kcal/mol from the above equation for :math:`\zeta` we find that :math:`m=4.3\times10^{-22}` g which is a valid value. For :math:`\varepsilon_h=1.5` kcal/mol, we get :math:`m=3\times10^{-22}` g which is still a valid value. After finding the mass :math:`m`, we can go back to the expression for :math:`\tau_L` and get its value. For example, for :math:`\varepsilon_h=1` kcal/mol we get :math:`\tau_L=3` ps while for :math:`\varepsilon_h=1.5` kcal/mol, we get :math:`\tau_L=` ps. For the overdamped Langevin dynamics the characteristic time is :math:`\tau_H=\zeta\varepsilon_h\tau_L/kT=6\pi \eta a^3 / kT`. In order to get it in units of ps, both :math:`\varepsilon_h` and :math:`k_BT` need to be of the same units. Since :math:`\varepsilon_h` is in kcal/mol, :math:`k_BT` should be also in kcal/mol (at :math:`T=300` K :math:`k_BT=0.6` kcal/mol). Therefore the simulation time step :math:`\Delta t=h\cdot\tau_H` is also in units of ps. With the standard parameters (:math:`\eta=0.01` gs :math:`^{-1}` cm :math:`^{-1}`, :math:`T=300` K and :math:`a=3.8` Å), :math:`\tau_H=248` ps. The parameter :math:`h` can be specified in configuration file. In the pulling/indentation simulation, cantilever velocity is defined as :math:`\nu_f=\Delta x/(n_{av} \cdot h \cdot \tau_H)` where :math:`\Delta x` is displacement of virtual bead, representing cantilever base, during :math:`n_{av}` steps, it is given in Å. The force is calculated in kcal/(molÅ), to get the force in pN, one need to multiplied by :math:`70`. Therefore, the cantilever spring constant :math:`\kappa` should be also specified in the units of kcal/(mol :math:`Å^{2}`). .. _theor-top: Topology ======== The SOP-GPU package includes two utilities for coarse-graining the system. The old one, ``sop-top`` can only create :math:`C_\alpha`-based model, but there is a functionality to make tandems out of the monomer. The new utility ``sop-top2`` uses a flexible coarse-graining configuration config, which allows one to create various coarse-grained models (e.g. :math:`C_\alpha` or :math:`C_\alpha-C_\beta`). Old sop-top utility ------------------- Creating of coarse-grained structure with corresponding topology file can be performed running ``sop-top`` utility as follow:: sop-top top_config_file.top As with the main program, configuration file should be passed as the first parameter to ``sop-top``. Executing the command above will generate new, coarse-grained PDB file and the topology file in Gromacs TOP format. The PDB file is used only to store coordinates of the particles and all the connectivities are described in TOP file. This configuration file can use the same features as configuration file for SOP-GPU, as described in Section :ref:`par-input`. Topology is created from the original (full-atomic) PDB file using its ``ATOM`` and ``SSBOND`` entries. All :math:`C_\alpha` atoms are added into ``[ atoms ]`` section of topology file generated. Backbone connectivity and disulfide bonds along with their equilibrium (PDB) distances are collected into ``[ bonds]`` section. Native contacts are determined based on two cut-off distances. The first one relates to a maximum :math:`C_\alpha-C_\alpha` distance for two amino-acids in native contact (*simple Go definition*), the second one is the cut-off for the minimal distance of two heavy atoms in corresponding amino-acids side-chains (*full Go definition*). Along with the indexes of amino-acids :math:`i` and :math:`j`, PDB distance :math:`r^0_{ij}` and value of :math:`\varepsilon_h` are saved for each pair qualify. :math:`\varepsilon_h` can be specified as constant value for all native pairs or can be taken from occupancy of beta columns of original PDB. In later case, geometric average of two values listed for amino-acids :math:`i` and :math:`j` are taken. New sop-top (sop-top2) utility ------------------------------ In some cases, the :math:`C_\alpha`-representation is not just sufficient. The ``sop-top2`` utility allows for the custom coarse-graining of the initial full-atomic system. The coarse-graining in this case relies on the coarse-graining configuration file, in which one can find a description on how to coarse-grain a particular amino-acid. For the convinience, two configs are supplied with the SOP-GPU package: one to get the :math:`C_\alpha` representation and the other --- to get the :math:`C_\alpha-C_\beta` representation of the protein system. The ``sop-top2`` program takes the path to the configuration file as an argument. In this file, one should specify the following parameters as an input: the path to the initial (all-atom) PDB file and the path to the coarse-grained configuration file. The coarse-graining configuration: :math:`C_\alpha-C_\beta` model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The coarse-graining configuration file starts with the list of masses for all the atoms present in the system:: MASS 1 H 1.00800 ! Hydrogen MASS 2 C 12.01100 ! Carbon MASS 3 N 14.00700 ! Nitrogen MASS 4 O 15.99900 ! Oxygen MASS 5 S 32.06000 ! Sulphur The description of the coarse-graining for each amino-acid follows. For instance, in the :math:`C_\alpha-C_\beta` approach, the alanine entry will be:: RESI ALA BEAD CA CA REPR N HN CA HA C O COOR CA CHAR 0.0 CONN +CA CB ENDBEAD BEAD CB SC REPR CB HB1 HB2 HB3 COOR CB CHAR 0.0 ENDBEAD ENDRESI Here, ``RESI`` and ``ENDRESI`` keywords encapsulate the description of the residue, which contain two beads entries: one for the :math:`C_\alpha`-bead and one for the side-chain (:math:`C_\beta`) bead. Each bead starts with the keyword `BEAD` followed by the name and type of the bead. For each bead, the following information should be provided: (1) Which atoms this bead represents (their names as they are in the initial PDB file are listed after ``REPR`` keyword). (2) Where the created bead should be placed (the name (or names) for the positioning atoms should be provided after ``COOR`` keyword). (3) The charge (``CHAR``) is the charge assigned to the bead (not used in SOP model). The entry ``CONN`` lists the covalent bonds that should be added for a particular bead. After this keyword listed are the names of the beads with which the current bead is connected to. The syntax resembles the CHARMM forcefield topology file: the ``+`` sign means that the connection is with the next residue in the polypeptide chain, ``-`` --- with the preceeding. Each covalent bond should be added once (i.e. if :math:`C_\alpha-C_\beta` bond is added for the :math:`C_\alpha`-atom, there is no necessity to add this bond for the :math:`C_\beta`-atom, as the alanine entry above illustrates). In the entry above, the :math:`C_\alpha`-bead (``CA``) is connected to the :math:`C_\alpha`-bead of the next residue (``+CA``) and to the :math:`C_\beta`-bead of the same residue (``CB``). There is no ``CONN`` entry for the :math:`C_\beta`-bead, since the :math:`C_\alpha-C_\beta` is already listed in the the :math:`C_\alpha` bead section. In other words, the entry for the alanine above, reads: In the residue ``ALA``, first bead is the ``CA`` (:math:`C_\alpha`) bead of the type ``CA`` (``BEAD CA CA``). It represents the atoms of the backbone (``REPR N HN CA HA C O``) and should be placed on the position of the :math:`C_\alpha`-atom of the alanine residue (``COOR CA``). Its charge is zero (``CHAR 0.0``). It is covalently connected to the :math:`C_\alpha`-bead of the next residue and the side-chain (:math:`C_\beta`) bead of the same residue. The second bead of the alanine residue is the ``CB`` (:math:`C_\beta`) bead of type ``SC`` (``BEAD CB SC``). It represents the side-chain atoms (``REPR CB HB1 HB2 HB3``) and should be placed on the position of the :math:`C_\beta`-atom from the initial all-atom PDB (``COOR CB``). Its charge is also zero (``CHAR 0.0``). The description of the bead and residue ends here. The corse-graining configuration file should provide similar description for all the residues in the initial PDB file (in general the description of the coarse-graining for all 20 essential amino-acids should be suffitient). If you system has some non-standart residues, sugars, nucleic acids, etc., you will need to add the coarse-graining description to the coarse-graining config file you use. To do so, you need to decide, how many beads for the residue you want to add, where you want to place them, which atoms they represent, what is the total charge of these atoms. The connection entry for each bead should include the covalent connectivity within the residue and(or) the connectivity to the next (preceding) residues, marked with ``+`` (``-``) sign. The provided with SOP-GPU :math:`C_\alpha-C_\beta` coarse-graining configuration file is called ``aa_to_cg.inp`` and includes the following description for the side chains of 20 essential amino-acids: (i) there is no side-chain for GLY; (ii) for the aliphatic amino acids (ALA, VAL, LEU, and ILE), the :math:`C_\beta`-bead is placed at the position of the center of mass of the side-chain; (iii) for residues THR and SER, the :math:`C_\beta`-atom is placed at the position of the hydroxyl oxygen; (iv) the side-chain of the acidic amino acids (ASP and GLU) is placed at the center of mass of the :math:`COO`-group; (v) the side-chain of the basic amino acids (LYS and ARG) is placed at the center of mass of the :math:`NH_3+`-group; (vi) for ASN and GLN, the :math:`C_\beta`-atom is placed at the position of the center of mass of the group :math:`CO-NH_2`; (vii) aromatic side-chains in PHE and TYR are represented by a single :math:`C_\beta`-bead placed at the geometrical center of the rings (for TYR, the bead representing the :math:`OH`-group is also added); (viii) TRP side-chain having a double-ring structure is represented by two beads placed in the geometrical centers of the rings; (ix) HIS is represented by a single bead placed at the geometrical center of the five-member ring forming the side-chain; (x) sulfur-containing amino acids (MET, CYS) are represented by a side-chain bead, placed at the position of the sulfur atom; and (xi) the :math:`C_\gamma`-atom in PRO is represented by the :math:`C_\beta`-bead linked to its :math:`C_\alpha` bead and to the :math:`C_\alpha` bead of the residue before, thus forming a cyclic bond structure. The coarse-graining configuration: :math:`C_\alpha` model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The coarse-graining configuration file :math:`C_\alpha`-based model is also provided with the package. It is called ``aa_to_cg_ca.inp``. The entry for Alanine residue in this file is:: RESI ALA BEAD CA CA REPR CA COOR CA CHAR 0.0 CONN +CA ENDBEAD ENDRESI Here, only one :math:`C_\alpha` bead of type ``CA`` is added (``BEAD CA CA``) on the position of the :math:`C_\alpha` atom (``COOR CA``). It has zero charge (``CHAR 0.0``) and connected to the :math:`C_\alpha` bead of the next residue in the polypeptide chain (``CONN +CA``). Additional covalent bonds ------------------------- In many proteins, the covalent bonding is not limited by the polypeptide backbone. The most common example is the disilfide bonding. In the ``sop-top2`` utility, these bonds can be added by providing additional file, that contain the list of additional bonds to be added. The path to this file can be specified by the parameter **additional_bonds**. If this parameter is absent, no additional bonds will be added. In this file, each line correspond to one bond and starts with the ``CONN`` keyword followed by the chain-residue-name triplet for two beads to be connected:: CONN A 49 SG B 76 SG The line above tell the programm to add the disulfide bond between the residue 49 from the chain A and residue 76 of the chain B. The names ``SG`` for both of these residues are the names for the side-chain atoms of the cystene residues in the :math:`C_\alpha-C_\beta` coarse-graining approach. Since there are no SG beads in the :math:`C_\alpha` model, the same disulfide bond would be:: CONN A 49 CA B 76 CA Topology file ------------- There are three types of interactions in SOP model: covalent interactions, native interactions and repulsive pairs. All these should be listed in Gromacs-style topology file (*.top*). SOP topology file has four sections: ``[ atoms ]``, that lists all the particles (:math:`C_\alpha` and :math:`C_\beta` atoms) in the system (including information about residue ID, residue and chain name, etc.), and three sections that correspond to three types of interactions: ``[ bonds ]`` for covalent bonds, ``[ native ]`` for native interactions and ``[ pairs ]`` for repulsive pairs. The ``[ atoms ]`` section follows the Gromacs-style atom description:: [atoms] ; nr type resnr residue atom cgnr charge mass 0 CA 1 LEU CA A 0.00 56.044 1 CG 1 LEU CG A 0.00 57.116 2 CA 2 ILE CA A 0.00 56.044 3 CD 2 ILE CD A 0.00 57.116 ... Here, the columns correspond to particle ID, particle type, number of residue, particle atom name, charge and mass. The last three sections consist of the list of interacting particles IDs, function type and set of specific parameters. Particle IDs correspond to internal program indexes and start from 0, function type column is set to 1 for all pairs and ignored, parameters are specific for each interaction type as described below. More details on file format can be found in Gromacs Manual. ``[ bonds ]`` section ^^^^^^^^^^^^^^^^^^^^^ Typical ``[ bonds ]`` section includes lines similar to the following:: [ bonds ] ; ai aj funct c0 c1 c2 c3 0 1 1 3.81188 1 2 1 3.77232 2 3 1 3.79319 ... Covalent bonds include backbone interactions and disulfide S-S bonds. Potential energy function term that corresponds to covalent bonds interaction is described by :math:`U_{FENE}` (Eq. :eq:`ufene`) in Eq. :eq:`usop`, where summation is made over all lines in ``[ bonds ]`` section of the topology file, :math:`i` and :math:`j` correspond to the particles IDs listed in the line, distance :math:`r_{ij}` is computed from particles coordinates, :math:`r^0_{ij}` is the distance between two corresponding :math:`C_{\alpha}` atoms in native state (PDB file), listed as the first parameter in the line (column ``c0``, see sample listing above). ``[ native ]`` section ^^^^^^^^^^^^^^^^^^^^^^ :: [ native ] ; ai aj funct c0 c1 c2 c3 5 9 1 5.85792 1.50000 5 10 1 7.06482 1.50000 5 35 1 6.64479 1.50000 ... In SOP model, native interactions (:math:`U^{ATT}_{NB}`, see Eq. :eq:`usop`) are described by full Lennard-Jones potential (Eq. :eq:`uatt`). Each term in the sum corresponds to one line in ``[ native ]`` section. Apart from IDs of interacting particles, equilibrium distance :math:`r^0_{ij}` (column ``c0``) and the strength of non-bonded energy interaction, :math:`\varepsilon_h` (column ``c1``), are listed. :math:`r^0_{ij}` is the distance between :math:`C_\alpha` atoms in native state (PDB file), value of :math:`\varepsilon_h` is usually between :math:`1.0` and :math:`1.5` kcal/mol and can be obtained from att-atom MD simulations. ``[ pairs ]`` section ^^^^^^^^^^^^^^^^^^^^^ :: [ pairs ] ; ai aj funct c0 c1 c2 c3 0 2 1 0 3 1 0 4 1 ... Pairs section correspond the third term in Eq. :eq:`usop` (Eq. :eq:`urep`). There is no pair-specific parameters in this section, only indexes are listed. Note, that this list scales as :math:`\sim N^2` with the system size :math:`N`, and saving all possible repulsive pairs in the topology file would lead to very large files. In SOP-GPU program, this section is used only for small systems and when trajectory massive-production is employed. When large system is simulated, dual-range cut-off algorithm is utilized and only pairs withing bigger cut-off are kept (pairlist). Pairlist is updated using exclusion principle: only those pairs that are withing cut-off distance but not in the list of excluded pairs added. Verlet list is built from this pairlist based on smaller cut-off distance and used when potential function and forces are computed. Excluded pairs are those already listed in ``[ bonds ]`` and ``[ native ]`` sections. Parameter for the topology creation ----------------------------------- Both ``sop-top`` and ``sop-top2`` use the parameters file, path to which is passed as a first argument. The parameters one can use are: - **structure** ** Type: Path to the file. Status: Required. Purpose: Path to the initial (all-atomic) PDB file. - **additional_bonds** ** Type: Path to the file. Status: Optional. Purpose: Path to the file with the list of additional bonds (e.g. S-S bonds) - **topology** ** Type: Path to the file. Status: Required. Purpose: Path to the output topology (*.top*) file. - **coordinates** ** Type: Path to the file. Status: Required. Purpose: Path to the output coarse-grained *.pdb* file. - **topology_psf** ** Type: Path to the file. Status: Optional. Uses with ``sop-top2`` only. Purpose: The path to save the topology in *.psf* (NAMD) format (for VMD visualisation). - **topology_natpsf** ** Type: Path to the file. Status: Optional. Uses with ``sop-top2`` only. Purpose: The path to save the topology in *.psf* (NAMD) format (for VMD visualisation). Native contacts will be saved instead of covalent bonds in the corresponding section. Convinient for the native contacts inspection. - **R_limit_bond** ** Type: Float. Status: Required. Purpose: The cut-off value for the distance between beads. If two beads are within this distance in the provided structure, they considered to form native contact. - **SC_limit_bond** ** Type: Float. Status: Optional. Default value: Purpose: The cut-off value for the distance between side-chain beads. If two atoms listed in the ``REPR`` section of the amino acid are within this distance in the provided structure, the beads considered to form native contact. - **eh** ** Type: Float or ``O``/``B``. Status: Required. Purpose: The value for the :math:`\varepsilon_h` parameter. If the float value is given, the value is the same for all contacts. If the ``O`` or ``B`` is specified, the value is taken as a geometric average of the beta or occupancy column value. .. _par-input: Input parameters file ===================== .. _gen_feat: General features ---------------- Input parameters file contains all the simulation parameters listed as tab or space separated pairs of name and value. Remarks are allowed using ":math:`\#`" character. To simplify creation of multiple configuration/output files, parameters values support macroses. This can be use full in order to avoid overwriting of the output files if multiple trajectories are running in parallel, for example when many-runs-per-GPU approach is used. Any parameter name in the file can be used as macros, additional macroses can be added using same name-value syntax as for regular parameters. To use macros, parameter name included in any other parameter value should be surrounded with ":math:`<`" and ":math:`>`" characters. For example, the following lines:: run 3 DCDfile .dcd result in the value for the output file name "*3.dcd*". .. _par-device: Device parameters ----------------- - **device** ** Type: Integer. Status: Required. Default value: 0. Purpose: ID of NVidia card to run simulations on. Use "nvidia-smi" or "deviceQuery" from NVidia SDK to check devices. - **block_size** ** Type: Integer. Status: Optional. Default value: 256. Purpose: Set the number of threads per block. Can be specified for every potential individually, using **block_size_covalent**, **block_size_native**, **block_size_pairs**, **block_size_pairlist** and **block_size_possiblepairs**. - **max_covalent**: ** Type: Integer. Status: Optional. Default value: 8. Purpose: Set the maximum number of pairs per residue for covalent interactions. - **max_native** ** Type: Integer. Status: Optional. Default value: 128. Purpose: Set the maximum number of pairs per residue for native interactions. - **max_pairs** ** Type: Integer. Status: Optional. Default value: 512. Purpose: Set the maximum number of pairs per residue for pairs list. - **max_possiblePairs** ** Type: Integer. Status: Optional. Default value: 4096. Purpose: Set the maximum number of pairs per residue for possible pairs list. .. _par-struct: Structure parameters -------------------- - **name** ** Type: String. Status: Required. Purpose: Name, assigned to the structure. Used mostly for files naming. - **topology** ** Type: Path to the file. Format: .top Status: Required. Purpose: Path to the structure topology file (see Section :ref:`theor-top`). - **coordinates** ** Type: Path to the file. Format: .pdb Status: Required. Purpose: Path to the structure initial coordinates file. .. _par-sim: General simulation parameters ----------------------------- - **numsteps** ** Type: Long integer. Status: Required. Purpose: Number of simulation steps. - **timestep** *