next up previous contents
Next: Optimal and suboptimal foldings Up: OUTPUT Previous: OUTPUT

The energy dot plot

A nucleic acid secondary structure dot plot is a triangular plot that depicts base pairs as dots or other symbols. We shall refer to these symbols as dots. A dot in column i and row j of a triangular array, $\{ (i,j) \vert 1 \leq i \leq j \leq n \}$ represents the base pair i.j. The advantage of a dot plot is that it can display the base pairs in more than 1 folding simultaneously. It can be used to compare a few foldings, or the base pair distribution in many millions of foldings.

Mfold computes a number, $\Delta G(i,j)$ for every possible base pair, i.j. This is the minimum free energy of any folding that contains the i.j base pair. As above, we let $\Delta G$ be the overall minimum folding free energy, and $\Delta \Delta G$ a user selected free energy increment. Clearly

\begin{displaymath}\Delta G= \min_{1 \leq i < j \leq n} \Delta G(i,j).
\end{displaymath}

The energy increment is derived from $\Delta G$ and P. That is, $\Delta \Delta G= P \times \Delta G/ 100$. The current convention is to lower $\Delta \Delta G$ to 12 kcal/mole when it would otherwise be greater, and to raise it to 1 kcal/mole when it would otherwise be smaller. Then the energy dot plot is defined to be the collection of all base pairs i.j satisfying:

\begin{displaymath}\Delta G(i,j) \leq \Delta G+ \Delta \Delta G.
\end{displaymath}

This dot plot contains the superposition of all possible foldings whose folding energy is within $\Delta \Delta G$ of the minimum folding energy. Typically, $\vert\Delta \Delta G\vert$ is small compared to $\vert\Delta G\vert$, or P is a small percentage. In this case, the energy dot plot contains the superposition of all close to optimal foldings.

The energy dot plot gives an overall visual impression of how ``well-defined'' the folding is. A cluttered plot, or cluttered regions, indicate either structural plasticity (the lack of well-defined structure) or else the inability of the algorithm to predict a structure with confidence. A couple of crude measures of ``well-definedness'' have been introduced in mfold . The first is ``P-num''. $P\!-\!num(i)$ is a measure of the level of promiscuity of ri in its pairing with other bases in foldings within $\Delta \Delta G$ of $\Delta G$. It is the number of different base pairs, i.j, or k.i that can form in this set of foldings, and is simply the number of dots in the ith row and ith column of the energy dot plot . If $\delta(expression)$ is defined to be 1 when ``expression'' is true, and 0 otherwise, then P-num may be defined as:

\begin{displaymath}P\!-\!num(i) = \sum_{k < i} \delta (\Delta G(k,i) \leq \Delta...
...{i < j} \delta (\Delta G(i,j) \leq \Delta G+ \Delta \Delta G).
\end{displaymath}

P-num pertains to individual bases. H-num is ``well-definedness'' measure for a base pair i.j. It is the average value of the two P-num quantities, adjusted by removing the ``desirable'' i.j base pair. That is:

\begin{displaymath}H\!-\!num(i,j) = ( P\!-\!num(i) + P\!-\!num(j) - 1 )/2.
\end{displaymath}

A helix, already defined as a collection of two or more consecutive base pairs, may be described as a triple i,j,k, where k is the number of base pairs, and the actual base pairs are $i.j,
i\!+\!1.j\!-\!1, \dots, i\!+\!k\!-\!1.j\!-\!k\!+\!1$. When k=1, the helix becomes a single base pair. With some abuse of notation, we may also write $H\!-\!num(i,j,k)$ to be the H-num value of the helix, i,j,k. This is the average value of H-num over all the base pairs in the helix.

There are 5 files associated with the energy dot plot .

`FILE_NAME.PLOT' : This is a text file that contains all the base pairs on the energy dot plot , organized into helices for which $\Delta G(i,j)$ is constant. The first record is a header, and each subsequent record describes a single helix. The records are usually sorted by $\Delta G(i,j)$, and are often filtered so that short helices or isolated base pairs (helices of length 1) in suboptimal foldings are removed. Figure 9 shows a sample plot file.

  
Figure 9: Selected records from a plot file. ``level'' refers to a free energy range that is to be plotted in the same color, where 1 is always optimal. The ``level'' parameter is obsolete in the newer plotting programs of mfold 3.0. ``istart'', ``jstart'' and ``length'' define a helix and refer to i,j,k, respectively. The ``energy'' is free energy expressed as an integer in 10ths of a kcal/mole. Note that this is not the free energy of the helix, but the mimimum free energy of any folding that contains the helix.
   level  length istart jstart energy
      1      8    206    242   -972
      1      7    319    434   -972
      1      7    108    141   -972
      1      7     53    185   -972
      1      6    334    412   -972
      1      6    308    444   -972
      1      6    288    472   -972
      1      6    247    279   -972
     ...
      2      4      8     23   -971
      2      2     69     78   -971
      2      4      1     24   -970
      2      2     10     17   -970
      2      3    345    400   -967
      2      2    297    462   -967
     ...

`FILE_NAME.ANN' : This file contains P-num information for a particular $\Delta \Delta G$. The ith record contains i and $P\!-\!num(i)$. This file is used for annotating plotted structures.

`FILE_NAME.H-NUM' : This file is the same as `file_name.plot', except that the ``energy'' column is replaced by an ``h-num'' column. These files are usually sorted by h-num; lowest to highest, or best determined to worst determined. Often, only helices in optimal foldings are retained. Figure 10 shows part of a sorted and filtered h-num file corresponding to the plot file in Figure 9.

  
Figure 10: The beginning and end of an h-num file sorted by h-num and filtered to include only helices in optimal foldings. As with P-num, H-num values are relative to a particular sequence and free energy increment.
   level  length istart jstart h-num
      1      4     38    194    6.8
      1      4    215    232    7.3
      1      5     31    201    8.4
      1      7     53    185    8.4
      1      2     47    189   11.0
      1      8    206    242   11.9
      1      6     61    176   13.7
      1      4     89    163   13.8
      1      3    255    271   14.0
      1      3    104    145   15.0
      1      1     68     79   16.0
      1      4    121    131   17.0
      1      6    288    472   17.3
    ...
      1      2    353    389   35.0
      1      3    364    377   38.7
      1      3    297    459   39.0

`FILE_NAME.PS' : This is a PostScript file of the energy dot plot .

`FILE_NAME.GIF' : This is an image of the energy dot plot in ``gif'' format, suitable for display on web pages.


next up previous contents
Next: Optimal and suboptimal foldings Up: OUTPUT Previous: OUTPUT
Michael Zuker
Institute for Biomedical Computing
Washington University in St. Louis
1998-12-05