next up previous contents
Next: AUXILIARY AND INDIVIDUAL PROGRAMS Up: OUTPUT Previous: The energy dot plot

Optimal and suboptimal foldings

Mfold predicts a number of optimal and suboptimal foldings. They are automatically predicted in order of increasing free energy, although this order may change when the more exact efn2 program is used to re-evaluate free energies. The number of computed foldings is limited directly by the MAX parameter, and in more subtle ways by the P and W parameters. It should be stated clearly here that while the energy dot plot rigorously displays all possible base pairs that can take part in all possible foldings within $\Delta \Delta G$ of $\Delta G$, the computation of foldings is arbitrary. They do not represent a statistical sample of likely foldings, but rather a collection of foldings that show the variation that is possible within optimal and suboptimal foldings.

The collection of triples, $i,j,\Delta G(i,j)$, for all possible base pairs is sorted in order of increasing $\Delta G(i,j)$. The algorithm to construct foldings proceeds as follows:

1.
The base pair at the top of the list is selected, and an optimal folding containing the selected base pair is computed.
2.
All base pairs in the computed folding, as well as all those within a distance of W of base pairs in the computed folding, are crossed off the list.
3.
The computed folding is retained if it contains at least W base pairs that were not found in previous foldings.
The first structure is always retained, even if it contains fewer than W base pairs. Steps 1 to 3 are repeated until either MAX structures have been computed and retained, or until there are no more base pairs on the list.

Mfold creates a number of files associated with predicted structures. The files marked with an optional ``html'' are created only when RUN_TYPE is html. Files that contain an underscore, `_', in their names enumerate the individual foldings, so that `file_name_i.ct' refers to the ct file for the ith predicted structure.

`FILE_NAME.OUT(.HTML)' : This is a text file (html file) containing a plain text form of output for each of the predicted foldings. It is useful because it can always be displayed and is intelligible for foldings on short sequences. The selected base pairs for computing each structure are specially marked with a `|' above and a `^' below. A sample output is shown in Figure 11.


  
Figure 11: The second and final folding of S. cerevisiae Phe-tRNA at 37°, with P=5% and W=3 (default values). (a) The selected base pair is G51-C63. The base numbers are placed so that the least significant digit, always a 0, is above or below the enumerated base. (b) The usual plotted representation. The efn2 program has adjusted $\Delta G$ from -22.3 to -22.7 kcal/mole
 FOLDING BASES    1 TO   76 OF tRNA
 Initial ENERGY  =     -22.3
 
              10       
----      UUA      AGU 
    GCGGAU     GCUC   U
    CGCUUA     CGAG   G
ACCA      --A      AGG 
     70            20  
 
                    30     
               G       CUG 
                  CCAGA    
                  GGUCU   A
               -       AGA 
                    40     
 
                       50       
                  AGGUC  |  UUC 
                       CUGUG    
                       GACAC   G
                  -----  ^  CUA 
                           60   
(a) Text (b) Plot

`FILE_NAME_I.CT' : The ``ct'' file (connect table) contains the sequence and base pair information, and is meant to be an input file for a structure drawing program. In addition to containing base pair information, it also lists the 5'and 3' neighbor of each base, allowing for the representation of circular RNA or multiple molecules. The ct file also lists the historical base numbering in the original sequence, as bases and base pairs are numbered according from 1 to the size of the folded segment. A portion of a ct file is displayed in Figure 12.

  
Figure 12: The ct file for the second and final folding of S. cerevisiae Phe-tRNA at 37°, with default parameters. The first record displays the fragment size (76), $\Delta G$ and sequence name. The ith subsequent record contains, in order, i, ri, the index of the 5'-connecting base, the index of the 3'-connecting base, the index of the paired base and the historical numbering of the ith base in the original sequence. The 5', 3' and base pair indices are 0 when there is no connection or base pair.
   76   ENERGY = -24.4 [initially  -23.2]  yeast tRNA Phe
    1 G       0    2   72    1
    2 C       1    3   71    2
    3 G       2    4   70    3
    4 G       3    5   69    4
    5 A       4    6   68    5
    6 U       5    7   67    6
    7 U       6    8    0    7
    8 U       7    9    0    8
   ...
   67 A      66   68    6   67
   68 U      67   69    5   68
   69 U      68   70    4   69
   70 C      69   71    3   70
   71 G      70   72    2   71
   72 C      71   73    1   72
   73 A      72   74    0   73
   74 C      73   75    0   74
   75 C      74   76    0   75
   76 A      75    0    0   76

`FILE_NAME.DET(.HTML)' : This is a text file (html file) containing the detailed breakdown of each folding into loops, and the corresponding decomposition of the overall free energy, $\Delta G$, into the free energy contributions, $\delta \delta G$, for each loop. A sample output is shown in Table 4.

 
Table 4: Free energy details for the second and final folding of S. cerevisiae Phe-tRNA at 37°, with default folding parameters. This layout mimics the html output.
Loop Free-Energy Decomposition
Structure 3
tRNA.seq Initial Free energy = -22.3
Structural element $\delta \delta G$ Information
External loop: -1.7 4 ss bases & 1 closing helices.
Stack: -3.4 External closing pair is G1-C72
Stack: -2.4 External closing pair is C2-G71
Stack: -1.5 External closing pair is G3-C70
Stack: -1.3 External closing pair is G4-U69
Stack: -1.1 External closing pair is A5-U68
Helix -9.7 6 base pairs.
Multi-loop: 1.0 External closing pair is U6-A67
    10 ss bases & 4 closing helices.
Stack: -2.1 External closing pair is C49-G65
Stack: -2.1 External closing pair is U50-A64
Stack: -2.2 External closing pair is G51-C63
Stack: -2.1 External closing pair is U52-A62
Helix -8.5 5 base pairs.
Hairpin loop: 4.8 Closing pair is G53-C61
Stack: -3.3 External closing pair is C27-G43
Stack: -2.1 External closing pair is C28-G42
Stack: -2.1 External closing pair is A29-U41
Stack: -2.4 External closing pair is G30-C40
Helix -9.9 5 base pairs.
Hairpin loop: 5.7 Closing pair is A31-U39
Stack: -3.4 External closing pair is G10-C25
Stack: -2.1 External closing pair is C11-G24
Stack: -2.4 External closing pair is U12-A23
Helix -7.9 4 base pairs.
Hairpin loop: 3.9 Closing pair is C13-G22

`FILE_NAME.SS-COUNT' : If l foldings are predicted, then ss-count(i) is the number of times that ri is single stranded in these foldings. Thus ss-count(i)/l is a sample based probability for single strandedness. The ss-count file contains the number of computed foldings in the first record. The ithsubsequent record contains i and ss-count(i). This file may be used to predict which regions of an RNA are likely to be single stranded, and values of ss-count, averaged over a window of perhaps 5 to 25 base pairs, are often plotted. This file is also used for annotating plotted structures.

`FILE_NAME_I.PLT2' : This is an intermediate, device independent plot file. It is the output of mfold's adaptation of the naview program for plotting secondary structures. This file is used as input to the plt22ps and plt22gif programs. It was originally intended to be used as input to the plt2 plotting package [39], but this software is now old and not maintained.

`FILE_NAME_I.PS' : This is a PostScript file of a secondary structure. It is the output of the plt22ps program.

`FILE_NAME_I.GIF' : This is an image file (gif) of a secondary structure. It is the output of the plt22gif program.

The progression from ct file to images of secondary structures is:
`file_name_i.ct' $\rightarrow$ naview $\rightarrow$`file_name_i.plt2' $\rightarrow$ plt22ps $\rightarrow$`file_name_i.ps'
or
`file_name_i.ct' $\rightarrow$ naview $\rightarrow$`file_name_i.plt2' $\rightarrow$ plt22gif $\rightarrow$`file_name_i.gif'

`FILE_NAME.HTML' : This is a simple html file that links together some of the output files. It is an early version of a format originally used by the mfold web server.

`FILE_NAME.LOG' : This is a log file containing the standard output and standard error of the various programs and scripts that make up mfold . It can be useful for debugging.

`FILE_NAME.PNT' : This is a human readable file containing the entire input sequence. Every 10th base is labeled. In addition, auxiliary information is incorporated, if there is any. Bases that are forced to be double stranded have the letter `F' underneath. Those that are forced to be single stranded have the letter `P' underneath. Pairs of rounded brackets `(` and `)' underline forced base pairs, and pairs of curly brackets `{` and `}' underline prohibited base pairs. If 2 disjoint segments are prohibited from pairing with one another, then these segments are highlighted by underlining the residues of the first with a common lowercase letter, and the residues of the second with the same letter in uppercase. Different letters are used for different prohibited pairs. `F' and `P' are not used in this case.


next up previous contents
Next: AUXILIARY AND INDIVIDUAL PROGRAMS Up: OUTPUT Previous: The energy dot plot
Michael Zuker
Institute for Biomedical Computing
Washington University in St. Louis
1998-12-05