This is a major wrapper for running several key functions from this package. It is meant to be used after loadCoverage has been used for a specific chromosome. The steps run include makeModels, preprocessCoverage, calculateStats, calculatePvalues and annotating with annotateTranscripts and matchGenes.
analyzeChr(
chr,
coverageInfo,
models,
cutoffPre = 5,
cutoffFstat = 1e-08,
cutoffType = "theoretical",
nPermute = 1,
seeds = as.integer(gsub("-", "", Sys.Date())) + seq_len(nPermute),
groupInfo,
txdb = NULL,
writeOutput = TRUE,
runAnnotation = TRUE,
lowMemDir = file.path(chr, "chunksDir"),
smooth = FALSE,
weights = NULL,
smoothFunction = bumphunter::locfitByCluster,
...
)Used for naming the output files when writeOutput=TRUE and
the resulting GRanges object.
A list containing a DataFrame –$coverage– with
the coverage data and a logical Rle –$position– with the positions
that passed the cutoff. This object is generated using loadCoverage.
You should have specified a cutoff value for loadCoverage unless
that you are using colsubset which will force a filtering step
with filterData when running preprocessCoverage.
The output from makeModels.
This argument is passed to preprocessCoverage
(cutoff).
This is used to determine the cutoff argument of
calculatePvalues and it's behaviour is determined by
cutoffType.
If set to empirical, the cutoffFstat
(example: 0.99) quantile is used via quantile. If set to
theoretical, the theoretical cutoffFstats (example: 1e-08) is
calculated via qf. If set to manual, cutoffFstats is
passed to calculatePvalues without any other calculation.
The number of permutations. Note that for a full chromosome,
a small amount (10) of permutations is sufficient. If set to 0, no
permutations are performed and thus no null regions are used, however, the
$regions component is created.
An integer vector of length nPermute specifying the
seeds to be used for each permutation. If NULL no seeds are used.
A factor specifying the group membership of each sample
that can later be used with the plotting functions in the
derfinderPlot package.
This argument is passed to
annotateTranscripts. If NULL,
TxDb.Hsapiens.UCSC.hg19.knownGene
is used.
If TRUE, output Rdata files are created at each
step inside a directory with the chromosome name (example: 'chr21' if
chrnum='21'). One Rdata file is created for each component described
in the return section.
If TRUE annotateTranscripts
and matchGenes are run. Otherwise these steps are skipped.
If specified, each chunk is saved into a separate Rdata
file under lowMemDir and later loaded in
fstats.apply when
running calculateStats and calculatePvalues. Using this option
helps reduce the memory load as each fork in bplapply
loads only the data needed for the chunk processing. The downside is a bit
longer computation time due to input/output.
Whether to smooth the F-statistics (fstats) or not. This
is by default FALSE. For RNA-seq data we recommend using FALSE.
Weights used by the smoother as described in smoother.
A function to be used for smoothing the F-statistics.
Two functions are provided by the bumphunter package:
loessByCluster and runmedByCluster. If
you are using your own custom function, it has to return a named list with
an element called $fitted that contains the smoothed F-statistics and
an element claled $smoothed that is a logical vector indicating
whether the F-statistics were smoothed or not. If they are not smoothed, the
original values will be used.
Arguments passed to other methods and/or advanced arguments. Advanced arguments:
If TRUE basic status updates will be printed along
the way. Default TRUE.
This argument is passed to preprocessCoverage.
This argument is passed to preprocessCoverage.
If TRUE, it returns a list with the results
from each step. Otherwise, it returns NULL. Default: the opposite of
writeOutput.
Passed to extendedMapSeqlevels, preprocessCoverage, calculateStats, calculatePvalues, annotateTranscripts, matchGenes, and define_cluster.
If returnOutput=TRUE, a list with six components:
The wallclock timing information for each step.
The main options used when running this function.
The output from preprocessCoverage.
The output from calculateStats.
The output from calculatePvalues.
The output from matchGenes.
These are the same components that are written to Rdata files if
writeOutput=TRUE.
If you are working with data from an organism different from 'Homo sapiens'
specify so by setting the global 'species' and 'chrsStyle' options. For
example:
options(species = 'arabidopsis_thaliana')
options(chrsStyle = 'NCBI')
## Collapse the coverage information
collapsedFull <- collapseFullCoverage(list(genomeData$coverage),
verbose = TRUE
)
#> 2026-03-31 17:41:05.336433 collapseFullCoverage: Sorting fullCov
#> 2026-03-31 17:41:05.3414 collapseFullCoverage: Collapsing chromosomes information by sample
## Calculate library size adjustments
sampleDepths <- sampleDepth(collapsedFull,
probs = c(0.5), nonzero = TRUE,
verbose = TRUE
)
#> 2026-03-31 17:41:05.343899 sampleDepth: Calculating sample quantiles
#> 2026-03-31 17:41:05.351555 sampleDepth: Calculating sample adjustments
## Build the models
groupInfo <- genomeInfo$pop
adjustvars <- data.frame(genomeInfo$gender)
models <- makeModels(sampleDepths, testvars = groupInfo, adjustvars = adjustvars)
## Analyze the chromosome
results <- analyzeChr(
chr = "21", coverageInfo = genomeData, models = models,
cutoffFstat = 1, cutoffType = "manual", groupInfo = groupInfo, mc.cores = 1,
writeOutput = FALSE, returnOutput = TRUE, method = "regular",
runAnnotation = FALSE
)
#> extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
#> 2026-03-31 17:41:05.402314 analyzeChr: Pre-processing the coverage data
#> 2026-03-31 17:41:05.813228 analyzeChr: Calculating statistics
#> 2026-03-31 17:41:05.820719 calculateStats: calculating the F-statistics
#> 2026-03-31 17:41:05.892198 analyzeChr: Calculating pvalues
#> 2026-03-31 17:41:05.89282 analyzeChr: Using the following manual cutoff for the F-statistics 1
#> 2026-03-31 17:41:05.894103 calculatePvalues: identifying data segments
#> 2026-03-31 17:41:05.90402 findRegions: segmenting information
#> 2026-03-31 17:41:05.91164 findRegions: identifying candidate regions
#> 2026-03-31 17:41:05.982945 findRegions: identifying region clusters
#> 2026-03-31 17:41:06.08079 calculatePvalues: calculating F-statistics for permutation 1 and seed 20260332
#> 2026-03-31 17:41:06.121574 findRegions: segmenting information
#> 2026-03-31 17:41:06.125389 findRegions: identifying candidate regions
#> 2026-03-31 17:41:06.205825 calculatePvalues: calculating the p-values
#> 2026-03-31 17:41:06.230881 calculatePvalues: skipping q-value calculation.
#> 2026-03-31 17:41:06.247136 analyzeChr: Annotating regions
names(results)
#> [1] "timeinfo" "optionsStats" "coveragePrep" "fstats" "regions"
#> [6] "annotation"