Extract coverage information for a set of regions

This function extracts the raw coverage information calculated by fullCoverage at each base for a set of regions found with calculatePvalues. It can further calculate the mean coverage per sample for each region.

getRegionCoverage(
  fullCov = NULL,
  regions,
  totalMapped = NULL,
  targetSize = 8e+07,
  files = NULL,
  ...
)

Arguments

fullCov

A list where each element is the result from loadCoverage used with returnCoverage = TRUE. Can be generated using fullCoverage. Alternatively, specify files to extract the coverage information from the regions of interest. This can be helpful if you do not wish to store fullCov for memory reasons.

regions

The $regions output from calculatePvalues. It is important that the seqlengths information is provided.

totalMapped

The total number of reads mapped for each sample. Providing this data adjusts the coverage to reads in targetSize library. By default, to reads per 80 million reads.

targetSize

The target library size to adjust the coverage to. Used only when totalMapped is specified.

files

A character vector with the full path to the sample BAM files (or BigWig files). The names are used for the column names of the DataFrame. Check rawFiles for constructing files. files can also be a BamFileList object created with BamFileList or a BigWigFileList object created with BigWigFileList.

...

Arguments passed to other methods and/or advanced arguments. Advanced arguments:

verbose: If TRUE basic status updates will be printed along the way.

Passed to extendedMapSeqlevels and define_cluster.

When fullCov is NULL, ... has the advanced argument protectWhich (default 30000) from loadCoverage. Also ... is passed to fullCoverage for loading the data on the fly. This can be useful for loading the data from a specific region (or small sets of regions) without having to load in memory the output the coverage information from all the genome.

Value

a list of data.frame where each data.frame has the coverage information (nrow = width of region, ncol = number of samples) for a given region. The names of the list correspond to the region indexes in regions

Details

When fullCov is the output of loadCoverage with cutoff non-NULL, getRegionCoverage assumes that the regions come from the same data. Meaning that filterData was not used again. This ensures that the regions are a subset of the data available in fullCov.

If fullCov is NULL and files is specified, this function will attempt to read the coverage from the files. Note that if you used 'totalMapped' and 'targetSize' before, you will have to specify them again to get the same results.

You should use at most one core per chromosome.

Author

Andrew Jaffe, Leonardo Collado-Torres

Examples

## Obtain fullCov object
fullCov <- list("21" = genomeDataRaw$coverage)

## Assign chr lengths using hg19 information, use only first two regions
library("GenomicRanges")
#> Loading required package: GenomeInfoDb
regions <- genomeRegions$regions[1:2]
seqlengths(regions) <- seqlengths(getChromInfoFromUCSC("hg19",
    as.Seqinfo = TRUE
))[
    mapSeqlevels(names(seqlengths(regions)), "UCSC")
]

## Finally, get the region coverage
regionCov <- getRegionCoverage(fullCov = fullCov, regions = regions)
#> extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
#> 2024-12-13 15:13:15.078148 getRegionCoverage: processing chr21
#> 2024-12-13 15:13:15.105094 getRegionCoverage: done processing chr21

Arguments

Value

Details

See also

Author

Examples