R/railMatrix.R
railMatrix.Rd
Rail (available at http://rail.bio) generates coverage BigWig files. These files are faster to load in R and to process. Rail creates an un-adjusted coverage BigWig file per sample and an adjusted summary coverage BigWig file by chromosome (median or mean). railMatrix reads in the mean (or median) coverage BigWig file and applies a threshold cutoff to identify expressed regions (ERs). Then it goes back to the sample coverage BigWig files and extracts the base level coverage for each sample. Finally it summarizes this information in a matrix with one row per ERs and one column per sample. This function is similar to regionMatrix but is faster thanks to the advantages presented by BigWig files.
railMatrix(
chrs,
summaryFiles,
sampleFiles,
L,
cutoff = NULL,
maxClusterGap = 300L,
totalMapped = NULL,
targetSize = 4e+07,
file.cores = 1L,
chrlens = NULL,
...
)
A character vector with the names of the chromosomes.
A character vector (or BigWigFileList) with the paths to
the summary BigWig files created by Rail. Either mean or median files. These
are library size adjusted by default in Rail. The order of the files in this
vector must match the order in chrs
.
A character vector with the paths to the BigWig files by sample. These files are unadjusted for library size.
The width of the reads used. Either a vector of length 1 or length equal to the number of samples.
The base-pair level cutoff to use. It's behavior is controlled
by filter
.
This determines the maximum gap between candidate ERs.
A vector with the total number of reads mapped for each
sample. The vector should be in the same order as the samples in data
.
Providing this data adjusts the coverage to reads in targetSize
library prior to filtering. See getTotalMapped for
calculating this vector.
The target library size to adjust the coverage to. Used
only when totalMapped
is specified. By default, it adjusts to
libraries with 40 million reads, which matches the default used in Rail.
Number of cores used for loading the BigWig files per chr.
The chromosome lengths in base pairs. If it's NULL
,
the chromosome length is extracted from the BAM files. Otherwise, it should
have the same length as chrs
.
Arguments passed to other methods and/or advanced arguments. Advanced arguments:
If TRUE
basic status updates will be printed along
the way. Default: TRUE
.
If TRUE
basic status updates will be printed
along the way when loading data. Default: TRUE
.
A BPPARAM object to use for the chr step. Set to
SerialParam when file.cores = 1
and
SnowParam otherwise.
Chunksize to use. Default: 1000.
Passed to filterData, findRegions and define_cluster.
A list with one entry per chromosome. Then per chromosome, a list with two components.
A set of regions based on the coverage filter cutoff as returned by findRegions.
A matrix with the mean coverage by sample for each candidate region.
Given a set of un-filtered coverage data (see fullCoverage), create candidate regions by applying a cutoff on the coverage values, and obtain a count matrix where the number of rows corresponds to the number of candidate regions and the number of columns corresponds to the number of samples. The values are the mean coverage for a given sample for a given region.
This function uses several other derfinder-package functions. Inspect the code if interested.
You should use at most one core per chromosome.
## BigWig files are not supported in Windows
if (.Platform$OS.type != "windows") {
## Get data
library("derfinderData")
## Identify sample files
sampleFiles <- rawFiles(system.file("extdata", "AMY",
package =
"derfinderData"
), samplepatt = "bw", fileterm = NULL)
names(sampleFiles) <- gsub(".bw", "", names(sampleFiles))
## Create the mean bigwig file. This file is normally created by Rail
## but in this example we'll create it manually.
library("GenomicRanges")
fullCov <- fullCoverage(files = sampleFiles, chrs = "chr21")
meanCov <- Reduce("+", fullCov$chr21) / ncol(fullCov$chr21)
createBw(list("chr21" = DataFrame("meanChr21" = meanCov)),
keepGR =
FALSE
)
summaryFile <- "meanChr21.bw"
## Get the regions
regionMat <- railMatrix(
chrs = "chr21", summaryFiles = summaryFile,
sampleFiles = sampleFiles, L = 76, cutoff = 5.1,
maxClusterGap = 3000L
)
## Explore results
names(regionMat$chr21)
regionMat$chr21$regions
dim(regionMat$chr21$coverageMatrix)
}
#> 2024-12-13 15:13:20.031066 fullCoverage: processing chromosome chr21
#> 2024-12-13 15:13:20.035075 loadCoverage: finding chromosome lengths
#> 2024-12-13 15:13:20.042265 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB113.bw
#> 2024-12-13 15:13:20.148698 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB123.bw
#> 2024-12-13 15:13:20.276117 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB126.bw
#> 2024-12-13 15:13:20.365579 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB130.bw
#> 2024-12-13 15:13:20.467202 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB135.bw
#> 2024-12-13 15:13:20.537279 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB136.bw
#> 2024-12-13 15:13:20.617349 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB145.bw
#> 2024-12-13 15:13:20.70036 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB153.bw
#> 2024-12-13 15:13:20.79787 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB159.bw
#> 2024-12-13 15:13:20.88582 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB178.bw
#> 2024-12-13 15:13:20.975366 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB92.bw
#> 2024-12-13 15:13:21.087801 loadCoverage: loading BigWig file /__w/_temp/Library/derfinderData/extdata/AMY/HSB97.bw
#> 2024-12-13 15:13:21.204752 loadCoverage: applying the cutoff to the merged data
#> 2024-12-13 15:13:21.220716 filterData: originally there were 48129895 rows, now there are 48129895 rows. Meaning that 0 percent was filtered.
#> 2024-12-13 15:13:22.018486 coerceGR: coercing sample meanChr21
#> 2024-12-13 15:13:22.087631 createBwSample: exporting bw for sample meanChr21
#> 2024-12-13 15:13:22.849372 loadCoverage: finding chromosome lengths
#> 2024-12-13 15:13:22.854736 loadCoverage: loading BigWig file meanChr21.bw
#> 2024-12-13 15:13:23.140509 loadCoverage: applying the cutoff to the merged data
#> 2024-12-13 15:13:24.223293 filterData: originally there were 48129895 rows, now there are 48129895 rows. Meaning that 0 percent was filtered.
#> 2024-12-13 15:13:24.367652 filterData: originally there were 48129895 rows, now there are 15927 rows. Meaning that 99.97 percent was filtered.
#> 2024-12-13 15:13:24.369988 findRegions: identifying potential segments
#> 2024-12-13 15:13:24.372375 findRegions: segmenting information
#> 2024-12-13 15:13:24.372736 .getSegmentsRle: segmenting with cutoff(s) 5.1
#> 2024-12-13 15:13:24.377439 findRegions: identifying candidate regions
#> 2024-12-13 15:13:24.40342 findRegions: identifying region clusters
#> 2024-12-13 15:13:24.463958 railMatrix: processing regions 1 to 204
#> 2024-12-13 15:13:24.468004 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB113.bw
#> 2024-12-13 15:13:24.527805 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB123.bw
#> 2024-12-13 15:13:24.582681 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB126.bw
#> 2024-12-13 15:13:24.635165 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB130.bw
#> 2024-12-13 15:13:24.689157 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB135.bw
#> 2024-12-13 15:13:24.742755 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB136.bw
#> 2024-12-13 15:13:24.797394 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB145.bw
#> 2024-12-13 15:13:24.852351 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB153.bw
#> 2024-12-13 15:13:24.907203 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB159.bw
#> 2024-12-13 15:13:24.960832 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB178.bw
#> 2024-12-13 15:13:25.014801 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB92.bw
#> 2024-12-13 15:13:25.071318 railMatrix: processing file /__w/_temp/Library/derfinderData/extdata/AMY/HSB97.bw
#> [1] 204 12