For a group of samples this function reads the coverage information for several chromosomes directly from the BAM files. Per chromosome, it merges the unfiltered coverage by sample into a DataFrame. The end result is a list with one such DataFrame objects per chromosome.

fullCoverage(
  files,
  chrs,
  bai = NULL,
  chrlens = NULL,
  outputs = NULL,
  cutoff = NULL,
  ...
)

Arguments

files

A character vector with the full path to the sample BAM files (or BigWig files). The names are used for the column names of the DataFrame. Check rawFiles for constructing files. files can also be a BamFileList object created with BamFileList or a BigWigFileList object created with BigWigFileList.

chrs

The chromosome of the files to read. The format has to match the one used in the input files.

bai

The full path to the BAM index files. If NULL it is assumed that the BAM index files are in the same location as the BAM files and that they have the .bai extension. Ignored if files is a BamFileList object.

chrlens

The chromosome lengths in base pairs. If it's NULL, the chromosome length is extracted from the BAM files. Otherwise, it should have the same length as chrs.

outputs

This argument is passed to the output argument of loadCoverage. If NULL or 'auto' it is then recycled.

cutoff

This argument is passed to filterData.

...

Arguments passed to other methods and/or advanced arguments. Advanced arguments:

verbose

If TRUE basic status updates will be printed along the way.

mc.cores

How many cores to use for reading the chromosome information. There's no benefit of using a number greater than the number of chromosomes. Also note that your harddisk speed will limit how much you get from using a higher mc.cores value.

mc.cores.load

Controls the number of cores to be used per chr for loading the data which is only useful in the scenario that you are loading in genome tiles. If not supplied, it uses mc.cores for loadCoverage. Default: 1. If you are using genome tiles, the total number of cores you'll use will be mc.cores times mc.cores.load.

Passed to loadCoverage, define_cluster and extendedMapSeqlevels. Note that filterData is used internally by loadCoverage (and hence fullCoverage) and has the important arguments totalMapped and targetSize which can be used to normalize the coverage by library size. See getTotalMapped for calculating totalMapped.

Value

A list with one element per chromosome.

produced by loadCoverage.

Author

Leonardo Collado-Torres

Examples

datadir <- system.file("extdata", "genomeData", package = "derfinder")
files <- rawFiles(
    datadir = datadir, samplepatt = "*accepted_hits.bam$",
    fileterm = NULL
)
## Shorten the column names
names(files) <- gsub("_accepted_hits.bam", "", names(files))

## Read and filter the data, only for 1 file
fullCov <- fullCoverage(files = files[1], chrs = c("21", "22"))
#> 2023-05-07 06:01:19.264883 fullCoverage: processing chromosome 21
#> 2023-05-07 06:01:19.273236 loadCoverage: finding chromosome lengths
#> 2023-05-07 06:01:19.293643 loadCoverage: loading BAM file /__w/_temp/Library/derfinder/extdata/genomeData/ERR009101_accepted_hits.bam
#> 2023-05-07 06:01:19.353115 loadCoverage: applying the cutoff to the merged data
#> 2023-05-07 06:01:19.367733 filterData: originally there were 48129895 rows, now there are 48129895 rows. Meaning that 0 percent was filtered.
#> 2023-05-07 06:01:19.368349 fullCoverage: processing chromosome 22
#> 2023-05-07 06:01:19.371603 loadCoverage: finding chromosome lengths
#> 2023-05-07 06:01:19.383013 loadCoverage: loading BAM file /__w/_temp/Library/derfinder/extdata/genomeData/ERR009101_accepted_hits.bam
#> 2023-05-07 06:01:19.43013 loadCoverage: applying the cutoff to the merged data
#> 2023-05-07 06:01:19.44397 filterData: originally there were 51304566 rows, now there are 51304566 rows. Meaning that 0 percent was filtered.
#> extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
fullCov
#> $chr21
#> DataFrame with 48129895 rows and 1 column
#>          ERR009101
#>              <Rle>
#> 1                0
#> 2                0
#> 3                0
#> 4                0
#> 5                0
#> ...            ...
#> 48129891         0
#> 48129892         0
#> 48129893         0
#> 48129894         0
#> 48129895         0
#> 
#> $chr22
#> DataFrame with 51304566 rows and 1 column
#>          ERR009101
#>              <Rle>
#> 1                0
#> 2                0
#> 3                0
#> 4                0
#> 5                0
#> ...            ...
#> 51304562         0
#> 51304563         0
#> 51304564         0
#> 51304565         0
#> 51304566         0
#> 
if (FALSE) {
## You can then use filterData() to filter the data if you want to.
## Use bplapply() if you want to do so with multiple cores as shown below.
library("BiocParallel")
p <- SnowParam(2L)
bplapply(fullCov, function(x) {
    library("derfinder")
    filterData(x, cutoff = 0)
}, BPPARAM = p)
}