next up previous contents
Next: Common running modes Up: Wise2 Documentation (version 2.2 Previous: Authors   Contents

Introduction for the impatient

It may well be that you want to understand Wise2's functionality now, without bothering with the concepts or the installation instructions. This section is designed for you.

Wise2 has four main executable programs using sequence inputs which are designed to provide access to the main algorithms sensibly. The algorithms you are interested in is genewise - compare protein information to genomic DNA and estwise - compare protein information to EST/cDNA DNA.

Other algorithms in Wise2 have their own single executables. In particular you might be interested in promoterwise

These are the programs which you might use for this.

genewise
a single protein vs a single genomic dna sequence
genewisedb
a database of proteins vs a database of genomic dna sequences. Read section 2.4 before you use this in anger though.
estwise
a single protein vs a single EST/cDNA sequence.
estwisedb
a database of proteins vs a database of EST/cDNA sequences. Read section 2.5 before you use this in anger though.

If you see error messages like

Warning Error
        Could not open human.gf as a genefrequency file
Warning Error
        Could not read a GeneFrequency file in human.gf
...
This means that the enviroment variable WISECONFIGDIR has not been set up correctly. You need to find where the distribution was downloaded to (a directory called something like wise2.1.16b) and inside that directory should be the configuration directory wisecfg. You need to setenv WISECONFIGDIR to that directory.

In each of the programs the protein can either be a protein sequence or a protein profile HMM, as made by the HMMER package (both version 1 and version 2 HMMs can be read). Any of the databases can have one entry (in which case more efficient routines are used), and databases of profile HMMs, such as those provided by Pfam, can be used.

The simple running of a protein sequence (drosophila) vs a human genomic sequence, using genewise is given below. The output comes on stdout, which in normal unix notation can be redirected to a file.

adnah:[/birney/search]<98>: genewise road.pep hngen.fa
genewise (unreleased release)
This program is freely distributed under a GPL. See source directory
Copyright (c) GRL limited: portions of the code are from separate copyright

Query protein:       roa1_drome
Comp Matrix:         blosum62.bla
Gap open:            12
Gap extension:       2
Start/End            local
Target Sequence      HSHNRNPA
Strand:              forward
Gene Paras:          human.gf
Codon Table:         codon.table
Subs error:          1e-05
Indel error:         1e-05
Model splice?        model
Model codon bias?    flat
Model intron bias?   tied
Null model           syn
Algorithm            623
Find start end points: [25,1387][346,3962] Score 87719
Recovering alignment: Alignment recoveredExplicit read offone 94%
genewise output
Score 253.10 bits over entire alignment
Scores as bits over a synchronous coding model

Warning: The bits scores is not probablistically correct for single seqs
See WWW help for more info



roa1_drome        88 AQKSRPHKIDGRVVEPKRAVPRQ                       DID 
                     A  +RPHK+DGRVVEPKRAV R+                       D   
                     AMNARPHKVDGRVVEPKRAVSRE                       DSQ 
HSHNRNPA        1867 gaagaccagggagggcaaggtagGTGAGTG  Intron 2   TAGgtc 
                     ctacgcaataggttacagctcga<0-----[1936 : 2083]-0>aca 
                     tgtagacggtaatgaagatccaa                       tta 


roa1_drome       114 SPNAGATVKKLFVGALKDDHDEQSIRDYFQHFGNIVDINIVIDKETGKK 
                      P A  TVKK+FVG +K+D +E  +RDYF+ +G I  I I+ D+ +GKK 
                     RPGAHLTVKKIFVGGIKEDTEEHHLRDYFEQYGKIEVIEIMTDRGSGKK 
HSHNRNPA        2093 acggctagaaatgggaaggaggcccagttgctgaaggagaaagcgagaa 
                     gcgcatctaatttggtaaacaaaatgaataaagatattattcaggggaa 
                     aatccatgagatttctaactaatcaatttagtaatagtacgtcactcga 


roa1_drome       163 RGFAFVEFDDYDPVDKVV                          QKQHQ 
                     RGFAFV FDD+D VDK+V                          QK H  
                     RGFAFVTFDDHDSVDKIV          L:I[att]        QKYHT 
HSHNRNPA        2240 agtgtgatggcgtggaagAGTAAGTA  Intron 3   TAGTTcatca 
                     ggtcttctaaaactaatt <1-----[2295 : 2387]-1>  aaaac 
                     gctctactcctccgtgtc                          gactt 


roa1_drome       187 LNGKMVDVKKALPKQNDQQGGGGGR                         
                     +NG   +V+KAL KQ         R                         
                     VNGHNCEVRKALSKQEMASASSSQR          G:G[ggt]       
HSHNRNPA        2405 gagcatggaagctacgagagttacaGGTATGCT  Intron 4       
                     tagaagatgactcaaatcgcccgag <1-----[2481 : 2793]    
                     gtccctataacgagaggtttaccaa                         

...truncated

The output is as follows

The pretty alignment shows the protein sequence on the first line, followed by a line indicating the similarity level of the match followed by 4 lines representing the DNA sequence. The DNA sequence in the exons descending in triplets, each triplet being a codon. The translation of each codon is shown above it. Between the two protein sequences a line indicating the similarity of the match is printed. In introns the DNA sequence is not shown but for the first 7 bases (making the 5' splice site) and the last 3 bases of the 3' splice site. The intervening sequence is indicated in the square brackets. Above each intron, for phase 1 and 2 introns (ones that split a codon) the implied protein to conceptual gene match is displayed, with the codon in square brackets.

Generally the defaults of the options are reasonably sensible, and for the main part you should trust them until you become familar with the package.

The following commands show how to run the other programs in a variety of different modes



Subsections
next up previous contents
Next: Common running modes Up: Wise2 Documentation (version 2.2 Previous: Authors   Contents
Eric DEVEAUD 2015-02-27