edu.msu.cme.rdp.classifier.train
Class RawSequenceParser

java.lang.Object
  extended by edu.msu.cme.rdp.classifier.train.RawSequenceParser

public class RawSequenceParser
extends java.lang.Object

A parser to parse a reader containing the raw sequences.


Field Summary
private  ParsedRawSequence curSeq
           
static java.lang.String delimiter
           
private  java.util.regex.Matcher matcher
           
private  ParsedRawSequence onDeck
           
private  java.util.regex.Pattern pattern
           
private  java.io.BufferedReader reader
           
private  java.lang.String regexFasta
           
 
Constructor Summary
RawSequenceParser(java.io.Reader in)
          Creates new RawSequenceParser to parse the input fasta file.
 
Method Summary
 void close()
          Closes the reader.
private  java.util.List decomposeHeader(java.lang.String s)
          Takes two different formats: the old format is a string of sequence header( ancestors seperated by delimiter, such as ";" in our case).
private  ParsedRawSequence getNextElement()
          Reads from the input stream and returns a parsed sequence.
 boolean hasNext()
          Returns true if there is a parsed sequence available.
private  java.lang.String modifySequence(java.lang.String s)
          Modifies the sequence.
 ParsedRawSequence next()
          Returns the next parsed sequence.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

delimiter

public static final java.lang.String delimiter
See Also:
Constant Field Values

pattern

private java.util.regex.Pattern pattern

matcher

private java.util.regex.Matcher matcher

regexFasta

private java.lang.String regexFasta

reader

private java.io.BufferedReader reader

onDeck

private ParsedRawSequence onDeck

curSeq

private ParsedRawSequence curSeq
Constructor Detail

RawSequenceParser

public RawSequenceParser(java.io.Reader in)
Creates new RawSequenceParser to parse the input fasta file.

Method Detail

close

public void close()
           throws java.io.IOException
Closes the reader.

Throws:
java.io.IOException

hasNext

public boolean hasNext()
                throws java.io.IOException
Returns true if there is a parsed sequence available.

Throws:
java.io.IOException

next

public ParsedRawSequence next()
                       throws java.util.NoSuchElementException,
                              java.io.IOException
Returns the next parsed sequence.

Throws:
java.util.NoSuchElementException
java.io.IOException

getNextElement

private ParsedRawSequence getNextElement()
                                  throws java.io.IOException
Reads from the input stream and returns a parsed sequence. Header format: seqID followed by a tab, followed by a list of ancestor nodes Reads one line for the header and decompose the header into a list of ancestors. Then reads the following lines for the sequence string and modifies the sequence string.

Throws:
java.io.IOException

decomposeHeader

private java.util.List decomposeHeader(java.lang.String s)
Takes two different formats: the old format is a string of sequence header( ancestors seperated by delimiter, such as ";" in our case). new format: the taxid of the immediate parent taxon It returns an array of ancestors with root ancestor first.


modifySequence

private java.lang.String modifySequence(java.lang.String s)
Modifies the sequence. Removes - and ~. It returns a string.