Overview - The QOD project aims to design a multiple genome comparison tool.

Next: Invoking qod and qodgui, Previous: Copying, Up: Top

2 Overview

The qod project aims to compare genomes and transfers annotation from a central genome to compared genomes. Additionnaly to the command line program, a graphical user interface, running under several os is available.

The qod program, as well as its graphical user interface, were written by Alban Mancheron, in collaboration with Eric Rivals and Raluca Uricaru.

The program qod implements a novel approach to compare multiple genomes. In each run, qod analyzes one genome compared to k other genomes. We call the analyzed genome the central genome, or more simply the center, while we call the genomes it is compared to the compared genomes. This terminology aims only at distinguishing these in the text.

Qod considers that the center shares a region with a compared genome if this region can be locally aligned with some region of that central genome. When extended to k compared genomes, we say a center region is common to all genomes is this region is shared between the center and each of the compared genome. Hence, the input of qod consists in k files, one per compared genome, each containing all the local alignments between the center and that compared genome. Note that a region corresponds to an interval in a genome sequence, and any local alignment links together one interval from the center with one from the comapred genome. Given these k collections of local alignments, qod computes all possible Maximum Common Intervals (mcis) of the center, where a common interval of the center is maximal whenever it cannot be extended by one base pair neither to the right nor to the left on the center. Although, this definition seem simple, some thoughts are needed to get acquainted to it. The mcis may overlap in the central genome, but not include one each other. Computing all mcis yields a segmentation of the central genome into regions that are covered by zero, one, or more mcis. Once qod has finished the segmentation, it partitions the central genome into non overlapping fragments according to the subset of mcis covering these fragments. Here fragment means region, interval, but we use this term to avoid confusion with the intervals of the segmentation. This gives a classification of fragments in three classes: 1) unshared, those that cannot be aligned with all other genomes 2) common but with a unique possible alignment or, 3) common with several possible alignments with at least another genome.

Once the partition and classification are computed, it is easy to intersect an annotated genomic feature to determine in which class of fragment it falls in, and if it is in a class 2 or 3, to see how it is aligned between the center and each compared genome. Qod performs this computation if the user provides it with an annotation file for the center. It even distinguishes for each fragment features that are entirely included in from those simply partly overlapping it. Once this is done, qod can transfer annotations to one or more compared genomes of your choice, and you can select interactively which types of annotations (features), control their level of similarity, and even insert personal annotations before exporting those potential annotations in a standard format (Genbank/Embl/Sequin).

The computation of the segmentation, the partition, and the classification are fast; however, reading and parsing the input files (at launch time), as well as displaying the results on the Graphical User Interface (gui) for all mcis, fragments, and features may take longer.