For a group of samples this function reads the coverage information for a specific chromosome directly from the BAM files. It then merges them into a DataFrame and removes the bases that do not pass the cutoff.

  cutoff = NULL,
  filter = "one",
  chrlen = NULL,
  output = NULL,
  bai = NULL,



A character vector with the full path to the sample BAM files (or BigWig files). The names are used for the column names of the DataFrame. Check rawFiles for constructing files. files can also be a BamFileList, BamFile, BigWigFileList, or BigWigFile object.


Chromosome to read. Should be in the format matching the one used in the raw data.


This argument is passed to filterData.


Has to be either 'one' (default) or 'mean'. In the first case, at least one sample has to have coverage above cutoff. In the second case, the mean coverage has to be greater than cutoff.


The chromosome length in base pairs. If it's NULL, the chromosome length is extracted from the BAM files.


If NULL then no output is saved in disk. If auto then an automatic name is constructed using UCSC names (chrXCovInfo.Rdata for example). If another character is specified, then that name is used for the output file.


The full path to the BAM index files. If NULL it is assumed that the BAM index files are in the same location as the BAM files and that they have the .bai extension. Ignored if files is a BamFileList object or if inputType=='BigWig'.


Arguments passed to other methods and/or advanced arguments. Advanced arguments:


If TRUE basic status updates will be printed along the way.


Has to be either bam or BigWig. It specifies the format of the raw data files. By default it's set to bam before trying to guess it from the file names.


When specified, tileGenome is used to break up the chromosome into chunks. We don't recommend this for BAM files as the coverage in the borders of the chunks might be slightly off.


NULL by default. When a GRanges is specified, this specific region of the genome is loaded instead of the full chromosome.


The naming style of the chromosomes in the input files. If the global option chrsStyle is not set, the naming style won't be changed. This option is useful when you want to use a specific naming style but the raw files use another style.


When not NULL and which is specified, this argument specifies by how much the ranges in which will be expanded. This helps get the same base level coverage you would get from reading the coverage for a full chromosome from BAM files. Otherwise some reads might be excluded and thus the base level coverage will be lower. NULL by default.


Whether to drop the bases with 'D' in the CIGAR strings or to include them. Only used with BAM files. FALSE by default.


Column names to be used the samples. By default it's names(files).

Passed to extendedMapSeqlevels, define_cluster, scanBamFlag and filterData. Note that filterData is used internally by loadCoverage and has the important arguments totalMapped and targetSize which can be used to normalize the coverage by library size. See getTotalMapped for calculating totalMapped.


A list with two components.


is a DataFrame object where each column represents a sample. The number of rows depends on the number of base pairs that passed the cutoff and the information stored is the coverage at that given base.


is a logical Rle with the positions of the chromosome that passed the cutoff.


The ... argument can be used to control which alignments to consider when reading from BAM files. See scanBamFlag.

Parallelization for loading the data in chunks is used only used when tilewidth is specified. You may use up to one core per tile.

If you set the advanced argument drop.D = TRUE, bases with CIGAR string "D" (deletion from reference) will be excluded from the base-level coverage calculation.

If you are working with data from an organism different from 'Homo sapiens' specify so by setting the global 'species' and 'chrsStyle' options. For example: options(species = 'arabidopsis_thaliana') options(chrsStyle = 'NCBI')


Leonardo Collado-Torres, Andrew Jaffe


datadir <- system.file("extdata", "genomeData", package = "derfinder")
files <- rawFiles(
    datadir = datadir, samplepatt = "*accepted_hits.bam$",
    fileterm = NULL
## Shorten the column names
names(files) <- gsub("_accepted_hits.bam", "", names(files))

## Read and filter the data, only for 2 files
dataSmall <- loadCoverage(files = files[1:2], chr = "21", cutoff = 0)
#> 2023-05-07 06:01:21.817745 loadCoverage: finding chromosome lengths
#> 2023-05-07 06:01:21.830585 loadCoverage: loading BAM file /__w/_temp/Library/derfinder/extdata/genomeData/ERR009101_accepted_hits.bam
#> 2023-05-07 06:01:21.879642 loadCoverage: loading BAM file /__w/_temp/Library/derfinder/extdata/genomeData/ERR009102_accepted_hits.bam
#> 2023-05-07 06:01:21.927497 loadCoverage: applying the cutoff to the merged data
#> 2023-05-07 06:01:21.94533 filterData: originally there were 48129895 rows, now there are 454 rows. Meaning that 100 percent was filtered.
if (FALSE) {
## Export to BigWig files
createBw(list("chr21" = dataSmall))

## Load data from BigWig files <- loadCoverage(c(
    ERR009101 = "", ERR009102 =
), chr = "chr21")

## Compare them
mapply(function(x, y) {
    x - y
}, dataSmall$coverage,$coverage)

## Note that the only difference is the type of Rle (integer vs numeric) used
## to store the data.