edu.msu.cme.rdp.classifier.train
Class TreeFactory

java.lang.Object
  extended by edu.msu.cme.rdp.classifier.train.TreeFactory

public class TreeFactory
extends java.lang.Object

A TreeFactory reads the raw taxonomic information and creates the RawHierarchyTree nodes.


Field Summary
private  java.util.List genus_wordConditionalProbList
           
private  java.util.List genusNodeList
           
private  float[] logArr
           
private  int ROOT_DEPTH
          The depth of root of the RawHierarchyTree is set to 0
private  RawHierarchyTree rootTree
           
private  java.util.Map<java.lang.Integer,Taxonomy> taxidMap
           
private  java.util.Map<java.lang.String,java.util.List> taxnameMap
           
private  int totalSequences
           
private  java.lang.String trainingVersion
           
private  java.io.BufferedWriter treeFile
           
private  float WF1
          A factor for probability correction
private  float WF2
          A factor for probability correction
private  float[] wordPriorArr
           
private  int[] wordProbPointerArr
           
 
Constructor Summary
TreeFactory(java.io.Reader taxReader, int trainsetNo, java.lang.String version, java.lang.String modification)
          Creates new TreeFactory.
 
Method Summary
(package private)  void addSequence(ParsedRawSequence pSeq)
           
private  void addSequencewithLineage(ParsedRawSequence pSeq)
          For the given sequence name, its ancestors, and the sequence string, creates a HierarchyTree for each ancestor, If the root does not exist, creates the root with a null parent.
private  void addSequencewithTaxid(ParsedRawSequence pSeq)
          For the given sequence name, its assigned taxid, and the sequence string, creates a HierarchyTree for each ancestor, If the root does not exist, creates the root with a null parent.
(package private)  void createGenusWordConditionalProb()
          This method does all the setup work for wordPrior and word conditional probability.
(package private)  void createNodeList(RawHierarchyTree root, java.lang.String level, java.util.List nodeList)
          Gets all the lowest level nodes in given hierarchy level starting from the given root.
private  void creatTaxidMap(java.io.Reader taxReader)
          It reads in a file containing the taxonomy information for all the nodes.
private  void displayTrainingTree(RawHierarchyTree root)
          Writes the phylogenetic taxonmic information of the given root and all the descendant nodes to a file.
(package private)  java.util.List getGenusNodeList()
          Return the list of geneus nodes.
(package private)  float getLogLeaveCount(int i)
          Returns the log value of ( number of leaves plus 1 ).
(package private)  float getLogWordPrior(int wordIndex)
          Returns the log value for word prior probability for the given word index.
(package private)  RawHierarchyTree getRoot()
          Gets the root of the tree
(package private)  int getStartIndex(int wordIndex)
          Returns the start index of RawGenusWordConditionalProb in the array for the given wordIndex.
(package private)  int getStopIndex(int wordIndex)
          Returns the stop index of RawGenusWordConditionalProb in the array for the given wordIndex.
private  Taxonomy getTaxonomy(ParsedRawSequence pSeq, int pid, int index)
          Gets the Taxonomy for the tree node in the ancestor list.
(package private)  RawGenusWordConditionalProb getWordConditionalProb(int posIndex)
          Returns a GenusWordConditionalProb from the array given the postion.
(package private)  void printGenusIndex_WordProbArr(java.lang.String outdir)
          Writes the indices of genus nodes and the conditional probabilities of words occurred in these genus nodes to a file.
(package private)  void printTrainingFiles(java.lang.String outdir)
          Writes the entire phylogenetic taxonmic information to a file.
(package private)  void printWordConditionalProbIndexArr(java.lang.String outdir)
          Writes the indices of words and the start indices of conditional probability of the genera containing these words to a file.
(package private)  void printWordPriors(java.lang.String outdir)
          Writes the log values of the word prior probabilities to a file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

genusNodeList

private java.util.List genusNodeList

genus_wordConditionalProbList

private java.util.List genus_wordConditionalProbList

wordProbPointerArr

private int[] wordProbPointerArr

logArr

private float[] logArr

rootTree

private RawHierarchyTree rootTree

taxnameMap

private java.util.Map<java.lang.String,java.util.List> taxnameMap

taxidMap

private java.util.Map<java.lang.Integer,Taxonomy> taxidMap

wordPriorArr

private float[] wordPriorArr

ROOT_DEPTH

private int ROOT_DEPTH
The depth of root of the RawHierarchyTree is set to 0


totalSequences

private int totalSequences

WF1

private final float WF1
A factor for probability correction

See Also:
Constant Field Values

WF2

private final float WF2
A factor for probability correction

See Also:
Constant Field Values

treeFile

private java.io.BufferedWriter treeFile

trainingVersion

private java.lang.String trainingVersion
Constructor Detail

TreeFactory

public TreeFactory(java.io.Reader taxReader,
                   int trainsetNo,
                   java.lang.String version,
                   java.lang.String modification)
            throws java.io.IOException
Creates new TreeFactory.

Throws:
java.io.IOException
Method Detail

creatTaxidMap

private void creatTaxidMap(java.io.Reader taxReader)
                    throws java.io.IOException
It reads in a file containing the taxonomy information for all the nodes. The taxonomy format is taxid(int), taxname(string), parentid(int), depth(int) and hierarchy level(string) seperated by * in one line. The information are kept in a hashMap, key: taxname, value: an array of the Taxonomy(taxid, parentid, depth and hierarchy rank level). Note: the depth for the root is 0.

Throws:
java.io.IOException

addSequence

void addSequence(ParsedRawSequence pSeq)
           throws java.io.IOException
Throws:
java.io.IOException

addSequencewithTaxid

private void addSequencewithTaxid(ParsedRawSequence pSeq)
                           throws java.io.IOException
For the given sequence name, its assigned taxid, and the sequence string, creates a HierarchyTree for each ancestor, If the root does not exist, creates the root with a null parent. If the root already exists, checks the ParsedSequence to see if its highest rank ancestor is the same as the previous root.

Throws:
java.io.IOException

addSequencewithLineage

private void addSequencewithLineage(ParsedRawSequence pSeq)
                             throws java.io.IOException
For the given sequence name, its ancestors, and the sequence string, creates a HierarchyTree for each ancestor, If the root does not exist, creates the root with a null parent. If the root already exists, checks the ParsedSequence to see if its highest rank ancestor is the same as the previous root.

Throws:
java.io.IOException

getTaxonomy

private Taxonomy getTaxonomy(ParsedRawSequence pSeq,
                             int pid,
                             int index)
Gets the Taxonomy for the tree node in the ancestor list.


getRoot

RawHierarchyTree getRoot()
Gets the root of the tree


createGenusWordConditionalProb

void createGenusWordConditionalProb()
This method does all the setup work for wordPrior and word conditional probability. 1. It calculates the prior for each word and keeps the value in an array 2. for each word, it calculates the conditional probability for non-zero occurrence genus, and keeps the value in an array.


getLogWordPrior

float getLogWordPrior(int wordIndex)
Returns the log value for word prior probability for the given word index.


getGenusNodeList

java.util.List getGenusNodeList()
Return the list of geneus nodes.


getLogLeaveCount

float getLogLeaveCount(int i)
Returns the log value of ( number of leaves plus 1 ).


getStartIndex

int getStartIndex(int wordIndex)
Returns the start index of RawGenusWordConditionalProb in the array for the given wordIndex.


getStopIndex

int getStopIndex(int wordIndex)
Returns the stop index of RawGenusWordConditionalProb in the array for the given wordIndex.


getWordConditionalProb

RawGenusWordConditionalProb getWordConditionalProb(int posIndex)
Returns a GenusWordConditionalProb from the array given the postion.


createNodeList

void createNodeList(RawHierarchyTree root,
                    java.lang.String level,
                    java.util.List nodeList)
Gets all the lowest level nodes in given hierarchy level starting from the given root.


printTrainingFiles

void printTrainingFiles(java.lang.String outdir)
                  throws java.io.IOException
Writes the entire phylogenetic taxonmic information to a file.

Throws:
java.io.IOException

displayTrainingTree

private void displayTrainingTree(RawHierarchyTree root)
                          throws java.io.IOException
Writes the phylogenetic taxonmic information of the given root and all the descendant nodes to a file. For each node, display the index and the name. For each sequence, display the name and the description.

Throws:
java.io.IOException

printWordPriors

void printWordPriors(java.lang.String outdir)
               throws java.io.IOException
Writes the log values of the word prior probabilities to a file.

Throws:
java.io.IOException

printWordConditionalProbIndexArr

void printWordConditionalProbIndexArr(java.lang.String outdir)
                                throws java.io.IOException
Writes the indices of words and the start indices of conditional probability of the genera containing these words to a file.

Throws:
java.io.IOException

printGenusIndex_WordProbArr

void printGenusIndex_WordProbArr(java.lang.String outdir)
                           throws java.io.IOException
Writes the indices of genus nodes and the conditional probabilities of words occurred in these genus nodes to a file.

Throws:
java.io.IOException