The Ribosomal Database Project (RDP) Classifier, a naïve Bayesian classifier, can rapidly and accurately classify bacterial 16S rRNA sequences into the new higher-order taxonomy proposed in Bergey's Taxonomic Outline of the Prokaryotes (2nd ed., release 5.0, Springer-Verlag, New York, NY, 2004). It provides taxonomic assignments from domain to genus, with confidence estimates for each assignment. The majority of classifications (98%) were of high estimated confidence (>95%) and high accuracy (98%). In addition to being tested with the corpus of 5,014 type strain sequences from Bergey's outline, the RDP Classifier was tested with a corpus of 23,095 rRNA sequences as assigned by the NCBI into their alternative higher-order taxonomy. The results from leave-one-out testing on both corpora show that the overall accuracies at all levels of confidence for near-full-length and 400-base segments were 89% or above down to the genus level, and the majority of the classification errors appear to be due to anomalies in the current taxonomies. For shorter rRNA segments, such as those that might be generated by pyrosequencing, the error rate varied greatly over the length of the 16S rRNA gene, with segments around the V2 and Starting in the mid-1980s, Carl Woese revolutionized the field of microbiology with his rRNA-based phylogenetic comparisons delineating the three main branches of life (28). Today, rRNA-based analysis remains a central method in microbiology, used not only to explore microbial diversity but also as a day-to-day method for bacterial identification. Identification methods are conceptually easier to interpret than molecular phylogenetic analyses and are often preferred when the groups are well understood. Most rRNA identification (classification) methods, as opposed to phylogenetic (clustering) methods, have been nearest-neighbor-based classification schemes (10, 18; however, see reference 4). In some part, this was due to the lack of a consistent, higher-level bacterial classification structure (taxonomy). Several recent events have helped change this situation. In 2002, an ad hoc committee for the reevaluation of species definition in bacteriology (24) advised that all new bacterial species descriptions include an rRNA sequence from the type strain, and in 2001, Bergey's Trust published a revised higher-order taxonomy attempting to reconcile bacterial taxonomy with rRNA-based phylogeny (12, 13).The naïve Bayesian classification method is simple yet can be extremely efficient. "Naïve" refers to the (naïve) assumption that data attributes are independent. Domingos and Pazzani (11) showed that the Bayesian method can still be optimal even when this attribute independency is violated. The method has also been reported to perform well on problems similar to the classification of sequence data, such as the classification of text documents, that have a high-dimensional feature space and sparse data (16).The Ribosomal Database Project II (RDP) provides data, tools, and services related to rRNA sequences to the rese...