This function merges the results from running analyzeChr on several chromosomes and assigns genomic states using annotateRegions. It re-calculates the p-values and q-values using the pooled areas from the null regions from all chromosomes. Once the results have been merged, derfinderReport::generateReport can be used to generate an HTML report of the results. The derfinderReport package is available at https://github.com/lcolladotor/derfinderReport.

mergeResults(
  chrs = c(seq_len(22), "X", "Y"),
  prefix = ".",
  significantCut = c(0.05, 0.1),
  genomicState,
  minoverlap = 20,
  mergePrep = FALSE,
  ...
)

Arguments

chrs

The chromosomes of the files to be merged.

prefix

The main data directory path, which can be useful if analyzeChr is used for several parameters and the results are saved in different directories.

significantCut

A vector of length two specifiying the cutoffs used to determine significance. The first element is used to determine significance for the P-values and FWER adjusted P-values, while the second element is used for the Q-values (FDR adjusted P-values) similar to calculatePvalues.

genomicState

A GRanges object created with makeGenomicState. It can be either the genomicState$fullGenome or genomicState$codingGenome component.

minoverlap

Determines the mininum overlap needed when annotating regions with annotateRegions.

mergePrep

If TRUE the output from preprocessCoverage is merged.

...

Arguments passed to other methods and/or advanced arguments. Advanced arguments:

verbose

If TRUE basic status updates will be printed along the way.

optionsStats

The options used in analyzeChr. By default NULL and will be inferred from the output files.

cutoffFstatUsed

The actual F-statistic cutoff used. This can be obtained from the logs or from the output of analyzeChr. If NULL then this function will attempt to re-calculate it.

Passed to annotateRegions and extendedMapSeqlevels.

Value

Seven Rdata files.

fullFstats.Rdata

Full F-statistics from all chromosomes in a list of Rle objects.

fullTime.Rdata

Timing information from all chromosomes.

fullNullSummary.Rdata

A DataFrame with the null region information: statistic, width, chromosome and permutation identifier. It's ordered by the statistics

fullRegions.Rdata

GRanges object with regions found and with full annotation from matchGenes. Note that the column strand from matchGenes is renamed to annoStrand to comply with GRanges specifications.

fullCoveragePrep.Rdata

A list with the pre-processed coverage data from all chromosomes.

fullAnnotatedRegions.Rdata

A list as constructed in annotateRegions with the assigned genomic states.

optionsMerge.Rdata

A list with the options used when merging the results. Used in derfinderReport::generateReport.

Details

If you want to calculate the FWER adjusted P-values, supply optionsStats which is produced by analyzeChr.

Author

Leonardo Collado-Torres

Examples

## The output will be saved in the 'generateReport-example' directory
dir.create("generateReport-example", showWarnings = FALSE, recursive = TRUE)

## For convenience, the derfinder output has been pre-computed
file.copy(system.file(file.path("extdata", "chr21"),
    package = "derfinder",
    mustWork = TRUE
), "generateReport-example", recursive = TRUE)
#> [1] TRUE

## Merge the results from the different chromosomes. In this case, there's
## only one: chr21
mergeResults(
    chrs = "21", prefix = "generateReport-example",
    genomicState = genomicState$fullGenome
)
#> extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
#> 2023-05-07 06:01:26.968032 mergeResults: Saving options used
#> 2023-05-07 06:01:26.968683 Loading chromosome chr21
#> Neither 'cutoffFstatUsed' nor 'optionsStats' were supplied, so the FWER calculation step will be skipped.
#> 2023-05-07 06:01:27.012941 mergeResults: Saving fullNullSummary
#> 2023-05-07 06:01:27.013651 mergeResults: Re-calculating the p-values
#> 2023-05-07 06:01:27.07809 mergeResults: Saving fullRegions
#> 2023-05-07 06:01:27.079335 mergeResults: assigning genomic states
#> 2023-05-07 06:01:27.187405 annotateRegions: counting
#> 2023-05-07 06:01:27.258744 annotateRegions: annotating
#> 2023-05-07 06:01:27.285194 mergeResults: Saving fullAnnotatedRegions
#> 2023-05-07 06:01:27.286116 mergeResults: Saving fullFstats
#> 2023-05-07 06:01:27.286791 mergeResults: Saving fullTime
if (FALSE) {
## You can then explore the wallclock time spent on each step
load(file.path("generateReport-example", "fullRegions.Rdata"))

## Process the time info
time <- lapply(fullTime, function(x) data.frame(diff(x)))
time <- do.call(rbind, time)
colnames(time) <- "sec"
time$sec <- as.integer(round(time$sec))
time$min <- time$sec / 60
time$chr <- paste0("chr", gsub("\\..*", "", rownames(time)))
time$step <- gsub(".*\\.", "", rownames(time))
rownames(time) <- seq_len(nrow(time))

## Make plot
library("ggplot2")
ggplot(time, aes(x = step, y = min, colour = chr)) +
    geom_point() +
    labs(title = "Wallclock time by step") +
    scale_colour_discrete(limits = chrs) +
    scale_x_discrete(limits = names(fullTime[[1]])[-1]) +
    ylab("Time (min)") +
    xlab("Step")
}