R/getRegionCoverage.R
getRegionCoverage.Rd
This function extracts the raw coverage information calculated by fullCoverage at each base for a set of regions found with calculatePvalues. It can further calculate the mean coverage per sample for each region.
getRegionCoverage(
fullCov = NULL,
regions,
totalMapped = NULL,
targetSize = 8e+07,
files = NULL,
...
)
A list where each element is the result from
loadCoverage used with returnCoverage = TRUE
. Can be generated
using fullCoverage. Alternatively, specify files
to extract
the coverage information from the regions of interest. This can be
helpful if you do not wish to store fullCov
for memory reasons.
The $regions
output from calculatePvalues. It
is important that the seqlengths information is provided.
The total number of reads mapped for each sample.
Providing this data adjusts the coverage to reads in targetSize
library. By default, to reads per 80 million reads.
The target library size to adjust the coverage to. Used
only when totalMapped
is specified.
A character vector with the full path to the sample BAM files
(or BigWig files).
The names are used for the column names of the DataFrame. Check
rawFiles for constructing files
. files
can also be a
BamFileList
object created with BamFileList or a
BigWigFileList
object created with BigWigFileList.
Arguments passed to other methods and/or advanced arguments. Advanced arguments:
If TRUE
basic status updates will be printed along
the way.
Passed to extendedMapSeqlevels and define_cluster.
When fullCov
is NULL
, ...
has the advanced argument
protectWhich
(default 30000) from loadCoverage. Also
...
is passed to fullCoverage for loading the data on the fly.
This can be useful for loading the data from a specific region (or small
sets of regions) without having to load in memory the output the coverage
information from all the genome.
a list of data.frame where each data.frame has the coverage
information (nrow = width of region, ncol = number of samples) for a given
region. The names of the list correspond to the region indexes in
regions
When fullCov
is the output of loadCoverage with
cutoff
non-NULL, getRegionCoverage assumes that the regions
come from the same data. Meaning that filterData was not used again.
This ensures that the regions are a subset of the data available in
fullCov
.
If fullCov
is NULL
and files
is specified, this function
will attempt to read the coverage from the files. Note that if you used
'totalMapped' and 'targetSize' before, you will have to specify them again
to get the same results.
You should use at most one core per chromosome.
## Obtain fullCov object
fullCov <- list("21" = genomeDataRaw$coverage)
## Assign chr lengths using hg19 information, use only first two regions
library("GenomicRanges")
#> Loading required package: GenomeInfoDb
regions <- genomeRegions$regions[1:2]
seqlengths(regions) <- seqlengths(getChromInfoFromUCSC("hg19",
as.Seqinfo = TRUE
))[
mapSeqlevels(names(seqlengths(regions)), "UCSC")
]
## Finally, get the region coverage
regionCov <- getRegionCoverage(fullCov = fullCov, regions = regions)
#> extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
#> 2023-05-07 06:01:21.321599 getRegionCoverage: processing chr21
#> 2023-05-07 06:01:21.359046 getRegionCoverage: done processing chr21