The MEME Suite

Motif-based sequence analysis tools

Alphabets

The two IUPAC alphabets are allowed.

Protein Alphabet

The protein alphabet contains twenty characters for amino acids ("ACDEFGHIKLMNPQRSTVWY"), and is augmented by four more ambiguous characters ("BUXZ").

Symbol	Meaning
A	alanine
B	aspartate or asparagine
C	cystine
D	aspartate
E	glutamate
F	phenylalanine
G	glycine
H	histidine
I	isoleucine
K	lysine
L	leucine
M	methionine
N	asparagine
P	proline
Q	glutamine
R	arginine
S	serine
T	threonine
U	any
V	valine
W	tryptophan
Y	tyrosine
Z	glutamate or glutamine
X	any

DNA Alphabet

The four-character DNA alphabet ("ACGT") is augmented by eleven ambiguous characters ("RYKMSWBDHVN"). The RNA alphabet code for uracil ("U") is also accepted but it is converted to T for processing.

Symbol	Meaning
A	adenosine
C	cytidine
G	guanine
T	thymidine
U	uracil (MEME Suite converts to T)
R	G A (purine)
Y	T C (pyrimidine)
K	G T (keto)
M	A C (amino)
S	G C (strong)
W	A T (weak)
B	G T C
D	G A T
H	A C T
V	G C A
N	A G C T (any)

Other

Any other characters (digits, punctuation, etc.) are illegal. The typical behavior of MEME Suite programs is to skip any sections of sequence that contain these.