The standard Bioconductor packages support basic
coordinate strings like IRanges("1000-2000") and GRanges("chr1:1000-2000").
The GenomicCoordinates package extends these string parsing capabilities
to support a wide variety of genomic coordinate string formats. In particular,
it support automatic detection to coerce string coordinates into the
most appropriate Bioconductor object type (GRanges, GPos, GInteractions,
or IRanges), based on the string format.
In addition to the standard string format CHR:START-END(:STRAND),
GenomicCoordinates adds support for:
"chr1:1,000-2,000",
"chr1:100,000-200,000";"chr1 1000 2000";"chr1 1000 2000", "chr1: 1-10 | chr2: 20-30";"chr1 1 10:+ | chr2:20,000-30,000".Key GenomicCoordinates benefits include:
# Install from Bioconductor (when available)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("GenomicCoordinates")
# Or install development version from GitHub
BiocManager::install("js2264/GenomicCoordinates")
library(GenomicCoordinates)
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: 'generics'
#> The following objects are masked from 'package:base':
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: Seqinfo
The main function GenomicCoordinates() (and its alias GCoordinates())
automatically detects the most appropriate object type and returns the
corresponding Bioconductor object:
# Standard genomic ranges
GenomicCoordinates("chr1:1000-2000")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1000-2000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Single positions (returns GPos)
GenomicCoordinates("chr1:1000:-")
#> UnstitchedGPos object with 1 position and 0 metadata columns:
#> seqnames pos strand
#> <Rle> <integer> <Rle>
#> [1] chr1 1000 -
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Genomic interactions (returns GInteractions)
GenomicCoordinates("chr1:1-10:-|chr2:20-30:+")
#> GInteractions object with 1 interaction and 0 metadata columns:
#> seqnames1 ranges1 strand1 seqnames2 ranges2 strand2
#> <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle>
#> [1] chr1 1-10 - --- chr2 20-30 +
#> -------
#> regions: 2 ranges and 0 metadata columns
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
# Pure numeric ranges (returns IRanges)
GenomicCoordinates("1,000-2,000")
#> IRanges object with 1 range and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 1000 2000 1001
One of the key limitations of standard Bioconductor string parsing is the lack of support for comma-separated numbers, which are commonly used in genomic databases and publications.
# GenomicCoordinates handles comma-separated numbers seamlessly
GenomicCoordinates("chr1:100,000-200,000")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 100000-200000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Works with single positions too
GenomicCoordinates("chr1:1,234,567")
#> UnstitchedGPos object with 1 position and 0 metadata columns:
#> seqnames pos strand
#> <Rle> <integer> <Rle>
#> [1] chr1 1234567 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# And with strand information
GenomicCoordinates("chr1:100,000-200,000:+")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 100000-200000 +
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
Many genomic file formats and databases use space-delimited
coordinates. GenomicCoordinates supports these formats with flexible spacing:
# Basic space-delimited format
GenomicCoordinates("chr1 1000 2000")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1000-2000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Irregular spacing is handled automatically
GenomicCoordinates("chr1 1000 2000")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1000-2000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Mixed format with strand
GenomicCoordinates("chr1 1000 2000:+")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1000-2000 +
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Single positions with spaces
GenomicCoordinates("chr1 1000")
#> UnstitchedGPos object with 1 position and 0 metadata columns:
#> seqnames pos strand
#> <Rle> <integer> <Rle>
#> [1] chr1 1000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
Genome assemblies often contain complex chromosome names that standard parsers struggle with:
# Chromosome names with spaces (common in some organisms)
GenomicCoordinates("chr I:1000-2000")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr I 1000-2000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Scaffold and contig names
GenomicCoordinates("scaffold_123:1000-2000")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] scaffold_123 1000-2000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
GenomicCoordinates("GL000001.1:1000-2000")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] GL000001.1 1000-2000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Organism-specific naming conventions
GenomicCoordinates("2L:1000-2000") # Drosophila
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] 2L 1000-2000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
GenomicCoordinates("chrUn_GL000220v1:1000-2000") # Unplaced scaffolds
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chrUn_GL000220v1 1000-2000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
One of the most powerful features of GenomicCoordinates is its ability
to automatically detect the most appropriate Bioconductor object type
based on the input format.
# Single positions automatically return GPos objects
pos_result <- GenomicCoordinates("chr1:1000")
class(pos_result)
#> [1] "UnstitchedGPos"
#> attr(,"package")
#> [1] "GenomicRanges"
pos(pos_result)
#> [1] 1000
# Ranges return GRanges objects
range_result <- GenomicCoordinates("chr1:1000-2000")
class(range_result)
#> [1] "GRanges"
#> attr(,"package")
#> [1] "GenomicRanges"
start(range_result)
#> [1] 1000
end(range_result)
#> [1] 2000
Strings containing the pipe character (|) are
automatically parsed as genomic interactions:
# Simple interaction
interaction <- GenomicCoordinates("chr1:1-10|chr2:20-30")
class(interaction)
#> [1] "GInteractions"
#> attr(,"package")
#> [1] "InteractionSet"
# Check the anchor points
InteractionSet::anchors(interaction, "first")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1-10 *
#> -------
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
InteractionSet::anchors(interaction, "second")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr2 20-30 *
#> -------
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
# Works with complex and formatting too
GenomicCoordinates("chr1 1000 10000 | chr2:20,000-30,000")
#> GInteractions object with 1 interaction and 0 metadata columns:
#> seqnames1 ranges1 strand1 seqnames2 ranges2 strand2
#> <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle>
#> [1] chr1 1000-10000 * --- chr2 20000-30000 *
#> -------
#> regions: 2 ranges and 0 metadata columns
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
When no chromosome information is present, the parser returns IRanges objects:
# Pure numeric ranges
numeric_range <- GenomicCoordinates("1000-2000")
class(numeric_range)
#> [1] "IRanges"
#> attr(,"package")
#> [1] "IRanges"
# Works with comma-separated numbers
GenomicCoordinates("1,000-2,000")
#> IRanges object with 1 range and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 1000 2000 1001
# And space-delimited format
GenomicCoordinates("1000 2000")
#> IRanges object with 1 range and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 1000 2000 1001
Sometimes you may want to force a specific object type, regardless of the automatic detection:
# Force a single position to be a GRanges instead of GPos
GenomicCoordinates("chr1:1000", force_class = "GRanges")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Force a single position to be a GPos
# instead of GRanges (removes the `end` coordinate)
GenomicCoordinates("chr1:1000-2000", force_class = "GPos")
#> UnstitchedGPos object with 1 position and 0 metadata columns:
#> seqnames pos strand
#> <Rle> <integer> <Rle>
#> [1] chr1 1000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Force a genomic range to be an IRanges (extracts just coordinates)
GenomicCoordinates("chr1:1000-2000", force_class = "IRanges")
#> IRanges object with 1 range and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 1000 2000 1001
# Force a single range to be a GInteractions (creates self-interaction)
GenomicCoordinates("chr1:1000-2000", force_class = "GInteractions")
#> GInteractions object with 1 interaction and 0 metadata columns:
#> seqnames1 ranges1 strand1 seqnames2 ranges2 strand2
#> <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle>
#> [1] chr1 1000-2000 * --- chr1 1000-2000 *
#> -------
#> regions: 1 ranges and 0 metadata columns
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
GenomicCoordinates efficiently handles vectors of coordinate strings:
# Mixed vector of single positions and ranges
mixed_coords <- c("chr1:1000", "chr2:2000-3000", "chr3:5000")
GenomicCoordinates(mixed_coords)
#> GRanges object with 3 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 1000 *
#> [2] chr2 2000-3000 *
#> [3] chr3 5000 *
#> -------
#> seqinfo: 3 sequences from an unspecified genome; no seqlengths
# All single positions return GPos
single_positions <- c("chr1:1000", "chr2:2000", "chr3:3000")
GenomicCoordinates(single_positions)
#> UnstitchedGPos object with 3 positions and 0 metadata columns:
#> seqnames pos strand
#> <Rle> <integer> <Rle>
#> [1] chr1 1000 *
#> [2] chr2 2000 *
#> [3] chr3 3000 *
#> -------
#> seqinfo: 3 sequences from an unspecified genome; no seqlengths
# Vector of interactions
interactions <- c("chr1:1-10|chr2:20-30", "chr3:100-200|chr4:300-400")
GenomicCoordinates(interactions)
#> GInteractions object with 2 interactions and 0 metadata columns:
#> seqnames1 ranges1 strand1 seqnames2 ranges2 strand2
#> <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle>
#> [1] chr1 1-10 * --- chr2 20-30 *
#> [2] chr3 100-200 * --- chr4 300-400 *
#> -------
#> regions: 4 ranges and 0 metadata columns
#> seqinfo: 4 sequences from an unspecified genome; no seqlengths
GenomicCoordinates provides explicit conversion functions that work with
the enhanced string parsing capabilities:
# Convert to GRanges
as_granges("chr1:100,000-200,000")
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 100000-200000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Convert to GPos
as_gpos("chr1:1,234,567")
#> UnstitchedGPos object with 1 position and 0 metadata columns:
#> seqnames pos strand
#> <Rle> <integer> <Rle>
#> [1] chr1 1234567 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
# Convert to GInteractions
as_ginteractions("chr1:1,000-10,000|chr2:20,000-30,000")
#> GInteractions object with 1 interaction and 0 metadata columns:
#> seqnames1 ranges1 strand1 seqnames2 ranges2 strand2
#> <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle>
#> [1] chr1 1000-10000 * --- chr2 20000-30000 *
#> -------
#> regions: 2 ranges and 0 metadata columns
#> seqinfo: 2 sequences from an unspecified genome; no seqlengths
# Convert to IRanges
as_iranges("1,000-2,000")
#> IRanges object with 1 range and 0 metadata columns:
#> start end width
#> <integer> <integer> <integer>
#> [1] 1000 2000 1001
The detect_genomic_class() function allows you to determine what object
type would be returned without actually performing the parsing:
# Detect classes for various input types
inputs <- c(
"chr1:1000-2000",
"chr1:1000",
"chr1:1-10|chr2:20-30",
"1000-2000"
)
detect_genomic_class(inputs)
#> [1] "GRanges" "GPos" "GInteractions" "IRanges"
This is useful for:
sessionInfo()
#> R version 4.6.0 alpha (2026-03-30 r89742)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] GenomicCoordinates_0.99.3 GenomicRanges_1.63.1
#> [3] Seqinfo_1.1.0 IRanges_2.45.0
#> [5] S4Vectors_0.49.0 BiocGenerics_0.57.0
#> [7] generics_0.1.4 BiocStyle_2.39.0
#>
#> loaded via a namespace (and not attached):
#> [1] sass_0.4.10 bitops_1.0-9
#> [3] SparseArray_1.11.13 lattice_0.22-9
#> [5] digest_0.6.39 magrittr_2.0.4
#> [7] evaluate_1.0.5 grid_4.6.0
#> [9] bookdown_0.46 fastmap_1.2.0
#> [11] jsonlite_2.0.0 Matrix_1.7-5
#> [13] cigarillo_1.1.0 restfulr_0.0.16
#> [15] BiocManager_1.30.27 httr_1.4.8
#> [17] plyranges_1.31.7 XML_3.99-0.23
#> [19] Biostrings_2.79.5 codetools_0.2-20
#> [21] jquerylib_0.1.4 abind_1.4-8
#> [23] cli_3.6.5 crayon_1.5.3
#> [25] rlang_1.1.7 XVector_0.51.0
#> [27] Biobase_2.71.0 cachem_1.1.0
#> [29] DelayedArray_0.37.1 yaml_2.3.12
#> [31] otel_0.2.0 S4Arrays_1.11.1
#> [33] parallel_4.6.0 tools_4.6.0
#> [35] BiocParallel_1.45.0 dplyr_1.2.0
#> [37] Rsamtools_2.27.1 InteractionSet_1.39.0
#> [39] SummarizedExperiment_1.41.1 curl_7.0.0
#> [41] vctrs_0.7.2 R6_2.6.1
#> [43] BiocIO_1.21.0 matrixStats_1.5.0
#> [45] lifecycle_1.0.5 rtracklayer_1.71.3
#> [47] pkgconfig_2.0.3 bslib_0.10.0
#> [49] pillar_1.11.1 glue_1.8.0
#> [51] Rcpp_1.1.1 GenomicAlignments_1.47.0
#> [53] xfun_0.57 tibble_3.3.1
#> [55] tidyselect_1.2.1 MatrixGenerics_1.23.0
#> [57] knitr_1.51 rjson_0.2.23
#> [59] htmltools_0.5.9 plyinteractions_1.9.2
#> [61] rmarkdown_2.31 compiler_4.6.0
#> [63] RCurl_1.98-1.18