The algorithm description requires a formal statement of the problem. We are given a central genome T, which we need to compare with k other compared genomes: G_1, \cdots, G_k. For each compared genome G_j with 1 \leq j \leq k, we compute between T and G_j all local pairwise similarities whose statistical significance lies above a user-defined threshold. Each local alignment represents a pair of genomic intervals, one from T and one from G_j, that are aligned with each other. We consider the intervals on T of all those local pairwise similarities: they can be disjoint, overlap, or even include each other on T. The intervals corresponding to the T versus G_j comparison form the \cal C_j collection of base intervals on T. We assume that the n_j intervals of \cal C_j are ordered first by increasing beginning position in T, second by decreasing length. For 1 \leq i \leq n_j, the i^th interval of \cal C_j is denoted I_i^j (the superscript indicates the collection, and the subscript the interval index). The beginning and end positions of an interval I are denoted by b(I) and e(I) respectively. From the k input collections, \cal C_j with 1 \leq j \leq k, we want to compute all Maximal Common Intervals (mci), which we define below.
Definitions
An interval J is common to all \cal C_1 \leq j \leq k if and only if for any collection \cal C_j there exists an interval say I_i^j from \cal C_j such that J \subseteq I_i^j. Assume J = [p,q] with p<q is a common interval. J is said to be a maximal if neither [p-1,q] nor [p,q+1] are common intervals.
Basic properties
Several properties of mci underlie our algorithm.