Similar to intersectBed, closestBed searches for overlapping features in A and B. In the event that no feature in B overlaps the current feature in A, closestBed will report the closest (that is, least genomic distance from the start or end of A) feature in B. For example, one might want to find which is the closest gene to a significant GWAS polymorphism. Note that closestBed will report an overlapping feature as the closest—that is, it does not restrict to closest non-overlapping feature.
Usage:
closestBed [OPTIONS] -a <BED/GFF/VCF> -b <BED/GFF/VCF>
Option | Description |
---|---|
-s | Force strandedness. That is, find the closest feature in B overlaps A on the same strand. By default, this is disabled. |
-d | In addition to the closest feature in B, report its distance to A as an extra column. The reported distance for overlapping features will be 0. |
-t | How ties for closest feature should be handled. This occurs when two features in B have exactly the same overlap with a feature in A. By default, all such features in B are reported.
|
closestBed first searches for features in B that overlap a feature in A. If overlaps are found, the feature in B that overlaps the highest fraction of A is reported. If no overlaps are found, closestBed looks for the feature in B that is closest (that is, least genomic distance to the start or end of A) to A. For example, in the figure below, feature B1 would be reported as the closest feature to A1.
Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BED FILE A *************
BED File B ^^^^^^^^ ^^^^^^
Result ======
For example:
cat A.bed
chr1 100 200
cat B.bed
chr1 500 1000
chr1 1300 2000
closestBed -a A.bed -b B.bed
chr1 100 200 chr1 500 1000
This option behaves the same as the -s option for intersectBed while scanning for the closest (overlapping or not) feature in B. See the discussion in the intersectBed section for details.
When there are two or more features in B that overlap the same fraction of A, closestBed will, by default, report both features in B. Imagine feature A is a SNP and file B contains genes. It can often occur that two gene annotations (e.g. opposite strands) in B will overlap the SNP. As mentioned, the default behavior is to report both such genes in B. However, the -t option allows one to optionally choose the just first or last feature (in terms of where it occurred in the input file, not chromosome position) that occurred in B.
For example (note the difference between -l 200 and -l 300):
cat A.bed
chr1 100 101 rs1234
cat B.bed
chr1 0 1000 geneA 100 +
chr1 0 1000 geneB 100 -
closestBed -a A.bed -b B.bed
chr1 100 101 rs1234 chr1 0 1000 geneA 100 +
chr1 100 101 rs1234 chr1 0 1000 geneB 100 -
closestBed -a A.bed -b B.bed -t all
chr1 100 101 rs1234 chr1 0 1000 geneA 100 +
chr1 100 101 rs1234 chr1 0 1000 geneB 100 -
closestBed -a A.bed -b B.bed -t first
chr1 100 101 rs1234 chr1 0 1000 geneA 100 +
closestBed -a A.bed -b B.bed -t last
chr1 100 101 rs1234 chr1 0 1000 geneB 100 -
ClosestBed will optionally report the distance to the closest feature in the B file using the -d option. When a feature in B overlaps a feature in A, a distance of 0 is reported.
cat A.bed
chr1 100 200
chr1 500 600
cat B.bed
chr1 500 1000
chr1 1300 2000
closestBed -a A.bed -b B.bed -d
chr1 100 200 chr1 500 1000 300
chr1 500 600 chr1 500 1000 0