It may well be that you want to understand Wise2's functionality now, without bothering with the concepts or the installation instructions. This section is designed for you.
Wise2 has four main executable programs using sequence inputs which are designed to provide access to the main algorithms sensibly. The algorithms you are interested in is genewise - compare protein information to genomic DNA and estwise - compare protein information to EST/cDNA DNA.
Other algorithms in Wise2 have their own single executables. In particular you might be interested in promoterwise
These are the programs which you might use for this.
If you see error messages like
Warning Error Could not open human.gf as a genefrequency file Warning Error Could not read a GeneFrequency file in human.gf ...This means that the enviroment variable WISECONFIGDIR has not been set up correctly. You need to find where the distribution was downloaded to (a directory called something like wise2.1.16b) and inside that directory should be the configuration directory wisecfg. You need to setenv WISECONFIGDIR to that directory.
In each of the programs the protein can either be a protein sequence or a protein profile HMM, as made by the HMMER package (both version 1 and version 2 HMMs can be read). Any of the databases can have one entry (in which case more efficient routines are used), and databases of profile HMMs, such as those provided by Pfam, can be used.
The simple running of a protein sequence (drosophila) vs a human genomic sequence, using genewise is given below. The output comes on stdout, which in normal unix notation can be redirected to a file.
adnah:[/birney/search]<98>: genewise road.pep hngen.fa genewise (unreleased release) This program is freely distributed under a GPL. See source directory Copyright (c) GRL limited: portions of the code are from separate copyright Query protein: roa1_drome Comp Matrix: blosum62.bla Gap open: 12 Gap extension: 2 Start/End local Target Sequence HSHNRNPA Strand: forward Gene Paras: human.gf Codon Table: codon.table Subs error: 1e-05 Indel error: 1e-05 Model splice? model Model codon bias? flat Model intron bias? tied Null model syn Algorithm 623 Find start end points: [25,1387][346,3962] Score 87719 Recovering alignment: Alignment recoveredExplicit read offone 94% genewise output Score 253.10 bits over entire alignment Scores as bits over a synchronous coding model Warning: The bits scores is not probablistically correct for single seqs See WWW help for more info roa1_drome 88 AQKSRPHKIDGRVVEPKRAVPRQ DID A +RPHK+DGRVVEPKRAV R+ D AMNARPHKVDGRVVEPKRAVSRE DSQ HSHNRNPA 1867 gaagaccagggagggcaaggtagGTGAGTG Intron 2 TAGgtc ctacgcaataggttacagctcga<0-----[1936 : 2083]-0>aca tgtagacggtaatgaagatccaa tta roa1_drome 114 SPNAGATVKKLFVGALKDDHDEQSIRDYFQHFGNIVDINIVIDKETGKK P A TVKK+FVG +K+D +E +RDYF+ +G I I I+ D+ +GKK RPGAHLTVKKIFVGGIKEDTEEHHLRDYFEQYGKIEVIEIMTDRGSGKK HSHNRNPA 2093 acggctagaaatgggaaggaggcccagttgctgaaggagaaagcgagaa gcgcatctaatttggtaaacaaaatgaataaagatattattcaggggaa aatccatgagatttctaactaatcaatttagtaatagtacgtcactcga roa1_drome 163 RGFAFVEFDDYDPVDKVV QKQHQ RGFAFV FDD+D VDK+V QK H RGFAFVTFDDHDSVDKIV L:I[att] QKYHT HSHNRNPA 2240 agtgtgatggcgtggaagAGTAAGTA Intron 3 TAGTTcatca ggtcttctaaaactaatt <1-----[2295 : 2387]-1> aaaac gctctactcctccgtgtc gactt roa1_drome 187 LNGKMVDVKKALPKQNDQQGGGGGR +NG +V+KAL KQ R VNGHNCEVRKALSKQEMASASSSQR G:G[ggt] HSHNRNPA 2405 gagcatggaagctacgagagttacaGGTATGCT Intron 4 tagaagatgactcaaatcgcccgag <1-----[2481 : 2793] gtccctataacgagaggtttaccaa ...truncated
The output is as follows
The pretty alignment shows the protein sequence on the first line, followed by a line indicating the similarity level of the match followed by 4 lines representing the DNA sequence. The DNA sequence in the exons descending in triplets, each triplet being a codon. The translation of each codon is shown above it. Between the two protein sequences a line indicating the similarity of the match is printed. In introns the DNA sequence is not shown but for the first 7 bases (making the 5' splice site) and the last 3 bases of the 3' splice site. The intervening sequence is indicated in the square brackets. Above each intron, for phase 1 and 2 introns (ones that split a codon) the implied protein to conceptual gene match is displayed, with the codon in square brackets.
Generally the defaults of the options are reasonably sensible, and for the main part you should trust them until you become familar with the package.
The following commands show how to run the other programs in a variety of different modes