Given a set of un-filtered coverage data (see fullCoverage), create candidate regions by applying a cutoff on the coverage values, and obtain a count matrix where the number of rows corresponds to the number of candidate regions and the number of columns corresponds to the number of samples. The values are the mean coverage for a given sample for a given region.

regionMatrix(
  fullCov,
  cutoff = 5,
  L,
  totalMapped = 8e+07,
  targetSize = 8e+07,
  runFilter = TRUE,
  returnBP = TRUE,
  ...
)

Arguments

fullCov

A list where each element is the result from loadCoverage used with returnCoverage = TRUE. Can be generated using fullCoverage. If runFilter = FALSE, then returnMean = TRUE must have been used.

cutoff

The base-pair level cutoff to use. It's behavior is controlled by filter.

L

The width of the reads used. Either a vector of length 1 or length equal to the number of samples.

totalMapped

A vector with the total number of reads mapped for each sample. The vector should be in the same order as the samples in fullCov. Providing this argument adjusts the coverage to reads in targetSize library prior to filtering. See getTotalMapped for calculating this vector.

targetSize

The target library size to adjust the coverage to. Used only when totalMapped is specified. By default, it adjusts to libraries with 80 million reads.

runFilter

This controls whether to run filterData or not. If set to FALSE then returnMean = TRUE must have been used to create each element of fullCov and the data must have been normalized (totalMapped equal to targetSize).

returnBP

If TRUE, returns $bpCoverage explained below.

...

Arguments passed to other methods and/or advanced arguments. Advanced arguments:

verbose

If TRUE basic status updates will be printed along the way.

chrsStyle

Default: UCSC. Passed to extendedMapSeqlevels via getRegionCoverage.

species

Default: homo_sapiens. Passed to extendedMapSeqlevels via getRegionCoverage.

currentStyle

Default: NULL. Passed to extendedMapSeqlevels via getRegionCoverage.

Passed to filterData, findRegions and define_cluster.

Note that filterData is used internally by loadCoverage (and hence fullCoverage) and has the important arguments totalMapped and targetSize which can be used to normalize the coverage by library size. If you already used these arguments when creating the fullCov object, then don't specify them a second time in regionMatrix. If you have not used these arguments, we recommend using them to normalize the mean coverage.

Value

A list with one entry per chromosome. Then per chromosome, a list with three components.

regions

A set of regions based on the coverage filter cutoff as returned by findRegions.

bpCoverage

A list with one element per region. Each element is a matrix with numbers of rows equal to the number of base pairs in the region and number of columns equal to the number of samples. It contains the base-level coverage information for the regions. Only returned when returnBP = TRUE.

coverageMatrix

A matrix with the mean coverage by sample for each candidate region.

Details

This function uses several other derfinder-package functions. Inspect the code if interested.

You should use at most one core per chromosome.

Author

Leonardo Collado-Torres

Examples

## Create some toy data
library("IRanges")
x <- Rle(round(runif(1e4, max = 10)))
y <- Rle(round(runif(1e4, max = 10)))
z <- Rle(round(runif(1e4, max = 10)))
fullCov <- list("chr21" = DataFrame(x, y, z))

## Calculate a proxy of library size
libSize <- sapply(fullCov$chr21, sum)

## Run region matrix normalizing the coverage
regionMat <- regionMatrix(
    fullCov = fullCov, maxRegionGap = 10L,
    maxClusterGap = 300L, L = 36, totalMapped = libSize, targetSize = 4e4
)
#> 2026-03-31 17:41:30.179974 regionMatrix: processing chr21
#> 2026-03-31 17:41:30.180452 filterData: normalizing coverage
#> 2026-03-31 17:41:30.188146 filterData: done normalizing coverage
#> 2026-03-31 17:41:30.201927 filterData: originally there were 10000 rows, now there are 2586 rows. Meaning that 74.14 percent was filtered.
#> 2026-03-31 17:41:30.202519 findRegions: identifying potential segments
#> 2026-03-31 17:41:30.205043 findRegions: segmenting information
#> 2026-03-31 17:41:30.207878 findRegions: identifying candidate regions
#> 2026-03-31 17:41:30.235778 findRegions: identifying region clusters
#> 2026-03-31 17:41:30.326835 getRegionCoverage: processing chr21
#> 2026-03-31 17:41:30.338839 getRegionCoverage: done processing chr21
#> 2026-03-31 17:41:30.340795 regionMatrix: calculating coverageMatrix
#> 2026-03-31 17:41:30.345507 regionMatrix: adjusting coverageMatrix for 'L'
if (FALSE) { # \dontrun{
## You can alternatively use filterData() on fullCov to reduce the required
## memory before using regionMatrix(). This can be useful when mc.cores > 1
filteredCov <- lapply(fullCov, filterData,
    returnMean = TRUE, filter = "mean",
    cutoff = 5, totalMapped = libSize, targetSize = 4e4
)
regionMat2 <- regionMatrix(filteredCov,
    maxRegionGap = 10L,
    maxClusterGap = 300L, L = 36, runFilter = FALSE
)
} # }

## regionMatrix() can work with multiple chrs as shown below.
fullCov2 <- list("chr21" = DataFrame(x, y, z), "chr22" = DataFrame(x, y, z))
regionMat2 <- regionMatrix(
    fullCov = fullCov2, maxRegionGap = 10L,
    maxClusterGap = 300L, L = 36, totalMapped = libSize, targetSize = 4e4
)
#> 2026-03-31 17:41:30.358592 regionMatrix: processing chr21
#> 2026-03-31 17:41:30.359063 filterData: normalizing coverage
#> 2026-03-31 17:41:30.365977 filterData: done normalizing coverage
#> 2026-03-31 17:41:30.37999 filterData: originally there were 10000 rows, now there are 2586 rows. Meaning that 74.14 percent was filtered.
#> 2026-03-31 17:41:30.380639 findRegions: identifying potential segments
#> 2026-03-31 17:41:30.383179 findRegions: segmenting information
#> 2026-03-31 17:41:30.386006 findRegions: identifying candidate regions
#> 2026-03-31 17:41:30.414129 findRegions: identifying region clusters
#> 2026-03-31 17:41:30.506982 getRegionCoverage: processing chr21
#> 2026-03-31 17:41:30.519601 getRegionCoverage: done processing chr21
#> 2026-03-31 17:41:30.521843 regionMatrix: calculating coverageMatrix
#> 2026-03-31 17:41:30.526427 regionMatrix: adjusting coverageMatrix for 'L'
#> 2026-03-31 17:41:30.527132 regionMatrix: processing chr22
#> 2026-03-31 17:41:30.52767 filterData: normalizing coverage
#> 2026-03-31 17:41:30.535239 filterData: done normalizing coverage
#> 2026-03-31 17:41:30.550214 filterData: originally there were 10000 rows, now there are 2586 rows. Meaning that 74.14 percent was filtered.
#> 2026-03-31 17:41:30.55097 findRegions: identifying potential segments
#> 2026-03-31 17:41:30.553921 findRegions: segmenting information
#> 2026-03-31 17:41:30.557235 findRegions: identifying candidate regions
#> 2026-03-31 17:41:30.587377 findRegions: identifying region clusters
#> 2026-03-31 17:41:30.704902 getRegionCoverage: processing chr22
#> 2026-03-31 17:41:30.71903 getRegionCoverage: done processing chr22
#> 2026-03-31 17:41:30.721328 regionMatrix: calculating coverageMatrix
#> 2026-03-31 17:41:30.725962 regionMatrix: adjusting coverageMatrix for 'L'

## Combine results from multiple chromosomes
library("GenomicRanges")

## First extract the data
regs <- unlist(GRangesList(lapply(regionMat2, "[[", "regions")))
covMat <- do.call(rbind, lapply(regionMat2, "[[", "coverageMatrix"))
covBp <- do.call(c, lapply(regionMat2, "[[", "bpCoverage"))
## Force the names to match
names(regs) <- rownames(covMat) <- names(covBp) <- seq_len(length(regs))
## Combine into a list (not really needed)
mergedRegionMat <- list(
    "regions" = regs, "coverageMatrix" = covMat,
    "bpCoverage" = covBp
)