edu.msu.cme.rdp.classifier.rrnaclassifier
Class TrainingInfo

java.lang.Object
  extended by edu.msu.cme.rdp.classifier.rrnaclassifier.TrainingInfo

public class TrainingInfo
extends java.lang.Object

The TrainingInfo holds all the training information and taxonomy hierarchy information.


Field Summary
private  java.util.List genus_wordConditionalProbList
           
private  java.util.List genusNodeList
           
private  HierarchyVersion hierarchyVersion
           
private  boolean isGenusWordProbListDone
           
private  boolean isProbIndexArrDone
           
private  boolean isTreeDone
           
private  boolean isWordPriorArrDone
           
private  float[] logLeaveCountArr
           
private  float[] logWordPriorArr
           
private static int NUM_OF_WORDS
           
private  HierarchyTree rootTree
           
private  int[] wordConditionalProbIndexArr
           
private  float[] wordPairPriorDiffArr
           
 
Constructor Summary
TrainingInfo()
          Creates new TrainingInfo.
 
Method Summary
(package private)  Classifier createClassifier()
          Creates a new Classifier if all the train information have been completed, throws exception if not.
private  void createGenusNodeList(HierarchyTree root)
          Returns a list of all the genus rank nodes.
(package private)  void createGenusWordProbList(java.io.Reader reader)
          Reads in the index of the genus treenode and conditional probability that genus contains a word.
 void createLogWordPriorArr(java.io.Reader reader)
          Reads in the log value of the word prior probability and saves to an array LogWordPriorArr.
(package private)  void createProbIndexArr(java.io.Reader reader)
          Reads in start index of the conditional probability of each genus, saves to an array wordConditionalProbIndexArr.
(package private)  void createTree(java.io.Reader reader)
          Reads in the tree information from a reader and create all the HierarchyTrees.
(package private)  void generateWordPairDiffArr(int[] word, int beginIndex)
          For a given word w1 and the reverse complement word w2, calculates the difference between the log word prior of w1 and w2 and saves to an array.
(package private)  HierarchyTree getGenusNodebyIndex(int i)
          Returns a genus node from the genusNodeList at the specified position.
(package private)  int getGenusNodeListSize()
          Returns the number of the genus nodes.
(package private)  HierarchyVersion getHierarchyInfo()
          Returns the info of the taxonomy hierarchy from of the training file.
(package private)  java.lang.String getHierarchyVersion()
          Returns the version of the taxonomical hierarchy.
(package private)  float getLogLeaveCount(int i)
          Returns the log value of (number of leaves + 1) of a genus
(package private)  float getLogWordPrior(int wordIndex)
          Returns the log value of the prior probability of a word.
(package private)  HierarchyTree getRootTree()
          Returns the root of the trees.
(package private)  int getStartIndex(int wordIndex)
          Returns the start index of GenusIndexWordConditionalProb in the array for the specified wordIndex.
(package private)  int getStopIndex(int wordIndex)
          Returns the stop index of GenusIndexWordConditionalProb in the array for the specified wordIndex.
(package private)  GenusWordConditionalProb getWordConditionalProbObject(int posIndex)
          Returns a GenusIndexWordConditionalProb from the genusIndex_wordConditionalProbList at the specified postion in the list.
(package private)  float getWordPairPriorDiff(int wordIndex)
          Returns the difference between given word and its reverse complement word.
 boolean isSeqReversed(Sequence seq)
          Returns true if the sequence is in reverse orientation.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

genusNodeList

private java.util.List genusNodeList

genus_wordConditionalProbList

private java.util.List genus_wordConditionalProbList

NUM_OF_WORDS

private static int NUM_OF_WORDS

wordConditionalProbIndexArr

private int[] wordConditionalProbIndexArr

logLeaveCountArr

private float[] logLeaveCountArr

rootTree

private HierarchyTree rootTree

logWordPriorArr

private float[] logWordPriorArr

wordPairPriorDiffArr

private float[] wordPairPriorDiffArr

isTreeDone

private boolean isTreeDone

isWordPriorArrDone

private boolean isWordPriorArrDone

isProbIndexArrDone

private boolean isProbIndexArrDone

isGenusWordProbListDone

private boolean isGenusWordProbListDone

hierarchyVersion

private HierarchyVersion hierarchyVersion
Constructor Detail

TrainingInfo

public TrainingInfo()
Creates new TrainingInfo.

Method Detail

createTree

void createTree(java.io.Reader reader)
          throws java.io.IOException,
                 TrainingDataException
Reads in the tree information from a reader and create all the HierarchyTrees. Note: the tree information has to be read after at least one of the other three files because we need to set the version information.

Throws:
java.io.IOException
TrainingDataException

createLogWordPriorArr

public void createLogWordPriorArr(java.io.Reader reader)
                           throws java.io.IOException,
                                  TrainingDataException
Reads in the log value of the word prior probability and saves to an array LogWordPriorArr.

Throws:
java.io.IOException
TrainingDataException

generateWordPairDiffArr

void generateWordPairDiffArr(int[] word,
                             int beginIndex)
For a given word w1 and the reverse complement word w2, calculates the difference between the log word prior of w1 and w2 and saves to an array. Repeats for every possible word of size 8.


createGenusWordProbList

void createGenusWordProbList(java.io.Reader reader)
                       throws java.io.IOException,
                              TrainingDataException
Reads in the index of the genus treenode and conditional probability that genus contains a word. Saves the data into a list genus_wordConditionalProbList.

Throws:
java.io.IOException
TrainingDataException

createProbIndexArr

void createProbIndexArr(java.io.Reader reader)
                  throws java.io.IOException,
                         TrainingDataException
Reads in start index of the conditional probability of each genus, saves to an array wordConditionalProbIndexArr.

Throws:
java.io.IOException
TrainingDataException

createClassifier

Classifier createClassifier()
Creates a new Classifier if all the train information have been completed, throws exception if not.


getRootTree

HierarchyTree getRootTree()
Returns the root of the trees.


getGenusNodeListSize

int getGenusNodeListSize()
Returns the number of the genus nodes.


getGenusNodebyIndex

HierarchyTree getGenusNodebyIndex(int i)
Returns a genus node from the genusNodeList at the specified position.


getLogWordPrior

float getLogWordPrior(int wordIndex)
Returns the log value of the prior probability of a word.


getWordPairPriorDiff

float getWordPairPriorDiff(int wordIndex)
Returns the difference between given word and its reverse complement word.


getLogLeaveCount

float getLogLeaveCount(int i)
Returns the log value of (number of leaves + 1) of a genus


getStartIndex

int getStartIndex(int wordIndex)
Returns the start index of GenusIndexWordConditionalProb in the array for the specified wordIndex.


getStopIndex

int getStopIndex(int wordIndex)
Returns the stop index of GenusIndexWordConditionalProb in the array for the specified wordIndex.


getWordConditionalProbObject

GenusWordConditionalProb getWordConditionalProbObject(int posIndex)
Returns a GenusIndexWordConditionalProb from the genusIndex_wordConditionalProbList at the specified postion in the list.


getHierarchyVersion

java.lang.String getHierarchyVersion()
Returns the version of the taxonomical hierarchy.


getHierarchyInfo

HierarchyVersion getHierarchyInfo()
Returns the info of the taxonomy hierarchy from of the training file.


createGenusNodeList

private void createGenusNodeList(HierarchyTree root)
Returns a list of all the genus rank nodes. It searches the genus nodes starting from the root. It puts each genus node into genusNodeList in the order defined by its genusIndex.


isSeqReversed

public boolean isSeqReversed(Sequence seq)
Returns true if the sequence is in reverse orientation. Sums the difference between all the overlapping words from the query sequence and the reverse complements of those word. If the summation is less that zero, the query sequence is in reverse orientation.