Rail (available at http://rail.bio) generates coverage BigWig files. These files are faster to load in R and to process. Rail creates an un-adjusted coverage BigWig file per sample and an adjusted summary coverage BigWig file by chromosome (median or mean). railMatrix reads in the mean (or median) coverage BigWig file and applies a threshold cutoff to identify expressed regions (ERs). Then it goes back to the sample coverage BigWig files and extracts the base level coverage for each sample. Finally it summarizes this information in a matrix with one row per ERs and one column per sample. This function is similar to regionMatrix but is faster thanks to the advantages presented by BigWig files.

railMatrix(
  chrs,
  summaryFiles,
  sampleFiles,
  L,
  cutoff = NULL,
  maxClusterGap = 300L,
  totalMapped = NULL,
  targetSize = 4e+07,
  file.cores = 1L,
  chrlens = NULL,
  ...
)

Arguments

chrs

A character vector with the names of the chromosomes.

summaryFiles

A character vector (or BigWigFileList) with the paths to the summary BigWig files created by Rail. Either mean or median files. These are library size adjusted by default in Rail. The order of the files in this vector must match the order in chrs.

sampleFiles

A character vector with the paths to the BigWig files by sample. These files are unadjusted for library size.

L

The width of the reads used. Either a vector of length 1 or length equal to the number of samples.

cutoff

The base-pair level cutoff to use. It's behavior is controlled by filter.

maxClusterGap

This determines the maximum gap between candidate ERs.

totalMapped

A vector with the total number of reads mapped for each sample. The vector should be in the same order as the samples in data. Providing this data adjusts the coverage to reads in targetSize library prior to filtering. See getTotalMapped for calculating this vector.

targetSize

The target library size to adjust the coverage to. Used only when totalMapped is specified. By default, it adjusts to libraries with 40 million reads, which matches the default used in Rail.

file.cores

Number of cores used for loading the BigWig files per chr.

chrlens

The chromosome lengths in base pairs. If it's NULL, the chromosome length is extracted from the BAM files. Otherwise, it should have the same length as chrs.

...

Arguments passed to other methods and/or advanced arguments. Advanced arguments:

verbose

If TRUE basic status updates will be printed along the way. Default: TRUE.

verbose.load

If TRUE basic status updates will be printed along the way when loading data. Default: TRUE.

BPPARAM.railChr

A BPPARAM object to use for the chr step. Set to SerialParam when file.cores = 1 and SnowParam otherwise.

chunksize

Chunksize to use. Default: 1000.

Passed to filterData, findRegions and define_cluster.

Value

A list with one entry per chromosome. Then per chromosome, a list with two components.

regions

A set of regions based on the coverage filter cutoff as returned by findRegions.

coverageMatrix

A matrix with the mean coverage by sample for each candidate region.

Details

Given a set of un-filtered coverage data (see fullCoverage), create candidate regions by applying a cutoff on the coverage values, and obtain a count matrix where the number of rows corresponds to the number of candidate regions and the number of columns corresponds to the number of samples. The values are the mean coverage for a given sample for a given region.

This function uses several other derfinder-package functions. Inspect the code if interested.

You should use at most one core per chromosome.

Author

Leonardo Collado-Torres

Examples


## BigWig files are not supported in Windows
if (.Platform$OS.type != "windows") {
    ## Get data
    library("derfinderData")

    ## Identify sample files
    sampleFiles <- rawFiles(system.file("extdata", "AMY",
        package =
            "derfinderData"
    ), samplepatt = "bw", fileterm = NULL)
    names(sampleFiles) <- gsub(".bw", "", names(sampleFiles))

    ## Create the mean bigwig file. This file is normally created by Rail
    ## but in this example we'll create it manually.
    library("GenomicRanges")
    fullCov <- fullCoverage(files = sampleFiles, chrs = "chr21")
    meanCov <- Reduce("+", fullCov$chr21) / ncol(fullCov$chr21)
    createBw(list("chr21" = DataFrame("meanChr21" = meanCov)),
        keepGR =
            FALSE
    )

    summaryFile <- "meanChr21.bw"

    ## Get the regions
    regionMat <- railMatrix(
        chrs = "chr21", summaryFiles = summaryFile,
        sampleFiles = sampleFiles, L = 76, cutoff = 5.1,
        maxClusterGap = 3000L
    )

    ## Explore results
    names(regionMat$chr21)
    regionMat$chr21$regions
    dim(regionMat$chr21$coverageMatrix)
}
#> 2023-05-07 06:01:28.380766 fullCoverage: processing chromosome chr21
#> 2023-05-07 06:01:28.386188 loadCoverage: finding chromosome lengths
#> 2023-05-07 06:01:28.396699 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB113.bw
#> 2023-05-07 06:01:28.561579 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB123.bw
#> 2023-05-07 06:01:28.751334 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB126.bw
#> 2023-05-07 06:01:28.867704 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB130.bw
#> 2023-05-07 06:01:29.022374 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB135.bw
#> 2023-05-07 06:01:29.127725 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB136.bw
#> 2023-05-07 06:01:29.248534 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB145.bw
#> 2023-05-07 06:01:29.370049 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB153.bw
#> 2023-05-07 06:01:29.512037 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB159.bw
#> 2023-05-07 06:01:29.642779 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB178.bw
#> 2023-05-07 06:01:29.777023 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB92.bw
#> 2023-05-07 06:01:29.94113 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB97.bw
#> 2023-05-07 06:01:30.111075 loadCoverage: applying the cutoff to the merged data
#> 2023-05-07 06:01:30.137966 filterData: originally there were 48129895 rows, now there are 48129895 rows. Meaning that 0 percent was filtered.
#> 2023-05-07 06:01:31.162333 coerceGR: coercing sample meanChr21
#> 2023-05-07 06:01:31.265424 createBwSample: exporting bw for sample meanChr21
#> 2023-05-07 06:01:32.210556 loadCoverage: finding chromosome lengths
#> 2023-05-07 06:01:32.217945 loadCoverage: loading BigWig file meanChr21.bw
#> 2023-05-07 06:01:32.641408 loadCoverage: applying the cutoff to the merged data
#> 2023-05-07 06:01:34.626802 filterData: originally there were 48129895 rows, now there are 48129895 rows. Meaning that 0 percent was filtered.
#> 2023-05-07 06:01:34.836375 filterData: originally there were 48129895 rows, now there are 15927 rows. Meaning that 99.97 percent was filtered.
#> 2023-05-07 06:01:34.839652 findRegions: identifying potential segments
#> 2023-05-07 06:01:34.842824 findRegions: segmenting information
#> 2023-05-07 06:01:34.843155 .getSegmentsRle: segmenting with cutoff(s) 5.1
#> 2023-05-07 06:01:34.849387 findRegions: identifying candidate regions
#> 2023-05-07 06:01:34.888655 findRegions: identifying region clusters
#> 2023-05-07 06:01:34.977567 railMatrix: processing regions 1 to 204
#> 2023-05-07 06:01:34.982901 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB113.bw
#> 2023-05-07 06:01:35.069295 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB123.bw
#> 2023-05-07 06:01:35.155306 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB126.bw
#> 2023-05-07 06:01:35.23842 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB130.bw
#> 2023-05-07 06:01:35.32326 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB135.bw
#> 2023-05-07 06:01:35.406814 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB136.bw
#> 2023-05-07 06:01:35.491823 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB145.bw
#> 2023-05-07 06:01:35.578539 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB153.bw
#> 2023-05-07 06:01:35.666823 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB159.bw
#> 2023-05-07 06:01:35.753545 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB178.bw
#> 2023-05-07 06:01:35.839366 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB92.bw
#> 2023-05-07 06:01:35.92916 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB97.bw
#> [1] 204  12