This function extracts the coverage information calculated by fullCoverage for a set of exons determined by makeGenomicState. The underlying code is similar to getRegionCoverage with additional tweaks for calculating RPKM values.
coverageToExon(
fullCov = NULL,
genomicState,
L = NULL,
returnType = "raw",
files = NULL,
...
)
A list where each element is the result from
loadCoverage used with returnCoverage = TRUE
. Can be generated
using fullCoverage. Alternatively, specify files
to extract
the coverage information from the regions of interest. This can be
helpful if you do not wish to store fullCov
for memory reasons.
A GRanges object created with makeGenomicState.
It can be either the genomicState$fullGenome
or
genomicState$codingGenome
component.
The width of the reads used. Either a vector of length 1 or length equal to the number of samples.
If raw
, then the raw coverage information per exon
is returned. If rpkm
, RPKM values are calculated for each exon.
A character vector with the full path to the sample BAM files
(or BigWig files).
The names are used for the column names of the DataFrame. Check
rawFiles for constructing files
. files
can also be a
BamFileList
object created with BamFileList or a
BigWigFileList
object created with BigWigFileList.
Arguments passed to other methods and/or advanced arguments. Advanced arguments:
If TRUE
basic status updates will be printed along
the way.
A BPPARAM object to use for the strand step. If
not specified, then strandCores
specifies the number of cores to use
for the strand step. The actual number of cores used is the minimum of
strandCores
, mc.cores
and the number of strands in the data.
A BPPRAM object to use for the chr step. If not
specified, then mc.cores
specifies the number of cores to use for
the chr step. The actual number of cores used is the minimum of
mc.cores
and the number of samples.
Passed to extendedMapSeqlevels and define_cluster.
A matrix (nrow = number of exons in genomicState
corresponding to the chromosomes in fullCov
, ncol = number of
samples) with the number of reads (or RPKM) per exon. The row names
correspond to the row indexes of genomicState$fullGenome
(if
fullOrCoding='full'
) or genomicState$codingGenome
(if
fullOrCoding='coding'
).
Parallelization is used twice.
First, it is used by strand. Second, for processing the exons by
chromosome. So there is no gain in using mc.cores
greater than the
maximum of the number of strands and number of chromosomes.
If fullCov
is NULL
and files
is specified, this function
will attempt to read the coverage from the files. Note that if you used
'totalMapped' and 'targetSize' before, you will have to specify them again
to get the same results.
## Obtain fullCov object
fullCov <- list("21" = genomeDataRaw$coverage)
## Use only the first two exons
smallGenomicState <- genomicState
smallGenomicState$fullGenome <- smallGenomicState$fullGenome[
which(smallGenomicState$fullGenome$theRegion == "exon")[1:2]
]
## Finally, get the coverage information for each exon
exonCov <- coverageToExon(
fullCov = fullCov,
genomicState = smallGenomicState$fullGenome, L = 36
)
#> extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
#> class: SerialParam
#> bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
#> bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
#> bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
#> bpexportglobals: FALSE; bpexportvariables: FALSE; bpforceGC: FALSE
#> bpfallback: FALSE
#> bplogdir: NA
#> bpresultdir: NA
#> class: SerialParam
#> bpisup: FALSE; bpnworkers: 1; bptasks: 0; bpjobname: BPJOB
#> bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
#> bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
#> bpexportglobals: FALSE; bpexportvariables: FALSE; bpforceGC: FALSE
#> bpfallback: FALSE
#> bplogdir: NA
#> bpresultdir: NA
#> 2023-05-07 06:01:15.331878 coverageToExon: processing chromosome chr21