The two IUPAC alphabets are allowed.
The protein alphabet contains twenty characters for amino acids ("ACDEFGHIKLMNPQRSTVWY"), and is augmented by four more ambiguous characters ("BUXZ").
Symbol | Meaning |
---|---|
A | alanine |
B | aspartate or asparagine |
C | cystine |
D | aspartate |
E | glutamate |
F | phenylalanine |
G | glycine |
H | histidine |
I | isoleucine |
K | lysine |
L | leucine |
M | methionine |
N | asparagine |
P | proline |
Q | glutamine |
R | arginine |
S | serine |
T | threonine |
U | any |
V | valine |
W | tryptophan |
Y | tyrosine |
Z | glutamate or glutamine |
X | any |
The four-character DNA alphabet ("ACGT") is augmented by eleven ambiguous characters ("RYKMSWBDHVN"). The RNA alphabet code for uracil ("U") is also accepted but it is converted to T for processing.
Symbol | Meaning |
---|---|
A | adenosine |
C | cytidine |
G | guanine |
T | thymidine |
U | uracil (MEME Suite converts to T) |
R | G A (purine) |
Y | T C (pyrimidine) |
K | G T (keto) |
M | A C (amino) |
S | G C (strong) |
W | A T (weak) |
B | G T C |
D | G A T |
H | A C T |
V | G C A |
N | A G C T (any) |
Any other characters (digits, punctuation, etc.) are illegal. The typical behavior of MEME Suite programs is to skip any sections of sequence that contain these.