edu.msu.cme.rdp.classifier.train
Class ClassifierTraineeMaker

java.lang.Object
  extended by edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker

public class ClassifierTraineeMaker
extends java.lang.Object

A command line class to create training information from the raw data.


Constructor Summary
ClassifierTraineeMaker(java.lang.String taxFile, java.lang.String seqFile, int trainset_no, java.lang.String version, java.lang.String modification, java.lang.String outdir)
          Creates a new ClassifierTraineeMaker
 
Method Summary
static void main(java.lang.String[] args)
          This is the main method to create training files from raw taxonomic information.
static void printLicense()
          Prints the license information to std err.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ClassifierTraineeMaker

public ClassifierTraineeMaker(java.lang.String taxFile,
                              java.lang.String seqFile,
                              int trainset_no,
                              java.lang.String version,
                              java.lang.String modification,
                              java.lang.String outdir)
                       throws java.io.FileNotFoundException,
                              java.io.IOException
Creates a new ClassifierTraineeMaker

Parameters:
taxFile - contains the hierarchical taxonomy information in the following format: taxid*taxon name*parent taxid*depth*rank". taxid, the parent taxid and depth should be in integer format. depth indicates the depth from the root taxon.
seqFile - contains the raw training sequences in fasta format. The header of this fasta file starts with ">", followed by the sequence name, white space(s) and a list taxon names seperated by ';' with highest rank taxon first. For example: >seq1 ROOT;Ph1;Fam1;G1;
Note: a sequence can only be assigned to the lowest rank taxon.
trainset_no - is used to mark the training files generated.
version - indicates the version of the hierarchical taxonomy.
modification - holds the modification information of the taxonomy if any.
outdir - specifies the output directory. The parsed training information will be saved into four files in the given output directory.
Throws:
java.io.FileNotFoundException
java.io.IOException
Method Detail

printLicense

public static void printLicense()
Prints the license information to std err.


main

public static void main(java.lang.String[] args)
                 throws java.io.FileNotFoundException,
                        java.io.IOException
This is the main method to create training files from raw taxonomic information.

Usage: java ClassifierTraineeMaker tax_file rawseq.fa trainsetNo version version_modification output_directory. See the ClassifierTraineeMaker constructor for more detail.

Parameters:
args -
Throws:
java.io.FileNotFoundException
java.io.IOException