This function extracts the coverage information calculated by fullCoverage for a set of exons determined by makeGenomicState. The underlying code is similar to getRegionCoverage with additional tweaks for calculating RPKM values.
coverageToExon(
fullCov = NULL,
genomicState,
L = NULL,
returnType = "raw",
files = NULL,
...
)A list where each element is the result from
loadCoverage used with returnCoverage = TRUE. Can be generated
using fullCoverage. Alternatively, specify files to extract
the coverage information from the regions of interest. This can be
helpful if you do not wish to store fullCov for memory reasons.
A GRanges object created with makeGenomicState.
It can be either the genomicState$fullGenome or
genomicState$codingGenome component.
The width of the reads used. Either a vector of length 1 or length equal to the number of samples.
If raw, then the raw coverage information per exon
is returned. If rpkm, RPKM values are calculated for each exon.
A character vector with the full path to the sample BAM files
(or BigWig files).
The names are used for the column names of the DataFrame. Check
rawFiles for constructing files. files can also be a
BamFileList object created with BamFileList or a
BigWigFileList object created with BigWigFileList.
Arguments passed to other methods and/or advanced arguments. Advanced arguments:
If TRUE basic status updates will be printed along
the way.
A BPPARAM object to use for the strand step. If
not specified, then strandCores specifies the number of cores to use
for the strand step. The actual number of cores used is the minimum of
strandCores, mc.cores and the number of strands in the data.
A BPPRAM object to use for the chr step. If not
specified, then mc.cores specifies the number of cores to use for
the chr step. The actual number of cores used is the minimum of
mc.cores and the number of samples.
Passed to extendedMapSeqlevels and define_cluster.
A matrix (nrow = number of exons in genomicState
corresponding to the chromosomes in fullCov, ncol = number of
samples) with the number of reads (or RPKM) per exon. The row names
correspond to the row indexes of genomicState$fullGenome (if
fullOrCoding='full') or genomicState$codingGenome (if
fullOrCoding='coding').
Parallelization is used twice.
First, it is used by strand. Second, for processing the exons by
chromosome. So there is no gain in using mc.cores greater than the
maximum of the number of strands and number of chromosomes.
If fullCov is NULL and files is specified, this function
will attempt to read the coverage from the files. Note that if you used
'totalMapped' and 'targetSize' before, you will have to specify them again
to get the same results.
## Obtain fullCov object
fullCov <- list("21" = genomeDataRaw$coverage)
## Use only the first two exons
smallGenomicState <- genomicState
smallGenomicState$fullGenome <- smallGenomicState$fullGenome[
which(smallGenomicState$fullGenome$theRegion == "exon")[1:2]
]
## Finally, get the coverage information for each exon
exonCov <- coverageToExon(
fullCov = fullCov,
genomicState = smallGenomicState$fullGenome, L = 36
)
#> extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
#> class: SerialParam
#> bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
#> bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
#> bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
#> bpexportglobals: FALSE; bpexportvariables: FALSE; bpforceGC: FALSE
#> bpfallback: FALSE
#> bplogdir: NA
#> bpresultdir: NA
#> class: SerialParam
#> bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
#> bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
#> bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
#> bpexportglobals: FALSE; bpexportvariables: FALSE; bpforceGC: FALSE
#> bpfallback: FALSE
#> bplogdir: NA
#> bpresultdir: NA
#> 2024-12-13 15:13:10.811874 coverageToExon: processing chromosome chr21