Alphabets

The two IUPAC alphabets are allowed.

Protein Alphabet

The protein alphabet contains twenty characters for amino acids ("ACDEFGHIKLMNPQRSTVWY"), and is augmented by four more ambiguous characters ("BUXZ").

SymbolMeaning
Aalanine
Baspartate or asparagine
Ccystine
Daspartate
Eglutamate
Fphenylalanine
Gglycine
Hhistidine
Iisoleucine
Klysine
Lleucine
Mmethionine
Nasparagine
Pproline
Qglutamine
Rarginine
Sserine
Tthreonine
Uany
Vvaline
Wtryptophan
Ytyrosine
Zglutamate or glutamine
Xany

DNA Alphabet

The four-character DNA alphabet ("ACGT") is augmented by eleven ambiguous characters ("RYKMSWBDHVN"). The RNA alphabet code for uracil ("U") is also accepted but it is converted to T for processing.

SymbolMeaning
Aadenosine
Ccytidine
Gguanine
Tthymidine
Uuracil (MEME Suite converts to T)
RG A (purine)
YT C (pyrimidine)
KG T (keto)
MA C (amino)
SG C (strong)
WA T (weak)
BG T C
DG A T
HA C T
VG C A
NA G C T (any)

Other

Any other characters (digits, punctuation, etc.) are illegal. The typical behavior of MEME Suite programs is to skip any sections of sequence that contain these.