First, this function finds the regions of interest according to specified cutoffs. Then it permutes the samples and re-calculates the F-statistics. The area of the statistics from these segments are then used to calculate p-values for the original regions.

```
calculatePvalues(
coveragePrep,
models,
fstats,
nPermute = 1L,
seeds = as.integer(gsub("-", "", Sys.Date())) + seq_len(nPermute),
chr,
cutoff = quantile(fstats, 0.99, na.rm = TRUE),
significantCut = c(0.05, 0.1),
lowMemDir = NULL,
smooth = FALSE,
weights = NULL,
smoothFunction = bumphunter::locfitByCluster,
...
)
```

- coveragePrep
A list with

`$coverageProcessed`

,`$mclapplyIndex`

, and`$position`

normally generated using preprocessCoverage.- models
A list with

`$mod`

and`$mod0`

normally generated using makeModels.- fstats
A numerical Rle with the F-statistics normally generated using calculateStats.

- nPermute
The number of permutations. Note that for a full chromosome, a small amount (10) of permutations is sufficient. If set to 0, no permutations are performed and thus no null regions are used, however, the

`$regions`

component is created.- seeds
An integer vector of length

`nPermute`

specifying the seeds to be used for each permutation. If`NULL`

no seeds are used.- chr
A single element character vector specifying the chromosome name. This argument is passed to findRegions.

- cutoff
F-statistic cutoff to use to determine segments.

- significantCut
A vector of length two specifiying the cutoffs used to determine significance. The first element is used to determine significance for the P-values, while the second element is used for the Q-values (FDR adjusted P-values).

- lowMemDir
The directory where the processed chunks are saved when using preprocessCoverage with a specified

`lowMemDir`

.- smooth
Whether to smooth the F-statistics (

`fstats`

) or not. This is by default`FALSE`

. For RNA-seq data we recommend using`FALSE`

.- weights
Weights used by the smoother as described in smoother.

- smoothFunction
A function to be used for smoothing the F-statistics. Two functions are provided by the

`bumphunter`

package: loessByCluster and runmedByCluster. If you are using your own custom function, it has to return a named list with an element called`$fitted`

that contains the smoothed F-statistics and an element claled`$smoothed`

that is a logical vector indicating whether the F-statistics were smoothed or not. If they are not smoothed, the original values will be used.- ...
Arguments passed to other methods and/or advanced arguments. Advanced arguments:

- verbose
If

`TRUE`

basic status updates will be printed along the way.- scalefac
This argument is passed to fstats.apply and should be the same as the one used in preprocessCoverage. Default: 32.

- method
Has to be either 'Matrix' (default), 'Rle' or 'regular'. See details in fstats.apply.

- adjustF
A single value to adjust that is added in the denominator of the F-stat calculation. Useful when the Residual Sum of Squares of the alternative model is very small. Default: 0.

- writeOutput
If

`TRUE`

then the regions are saved before calculating q-values, and then overwritten once the q-values are written. This argument was introduced to save the results from the permutations (can take some time) to investigate the problem described at https://support.bioconductor.org/p/62026/- maxRegionGap
Passed to internal functions of findRegions. Default: 0.

Passed to findRegions,

`smoothFunction`

and define_cluster.

A list with four components:

- regions
is a GRanges with metadata columns given by findRegions with the additional metadata column

`pvalues`

: p-value of the region calculated via permutations of the samples;`qvalues`

: the qvalues calculated using qvalue;`significant`

: whether the p-value is less than 0.05 (by default);`significantQval`

: whether the q-value is less than 0.10 (by default). It also includes the mean coverage of the region (mean from the mean coverage at each base calculated in preprocessCoverage). Furthermore, if`groupInfo`

was not`NULL`

in preprocessCoverage, then the group mean coverage is calculated as well as the log 2 fold change (using group 1 as the reference).- nullStats
is a numeric Rle with the mean of the null statistics by segment.

- nullWidths
is a numeric Rle with the length of each of the segments in the null distribution. The area can be obtained by multiplying the absolute

`nullstats`

by the corresponding lengths.- nullPermutation
is a Rle with the permutation number from which the null region originated from.

```
## Collapse the coverage information
collapsedFull <- collapseFullCoverage(list(genomeData$coverage),
verbose = TRUE
)
#> 2023-05-07 06:01:07.083085 collapseFullCoverage: Sorting fullCov
#> 2023-05-07 06:01:07.08555 collapseFullCoverage: Collapsing chromosomes information by sample
## Calculate library size adjustments
sampleDepths <- sampleDepth(collapsedFull, probs = c(0.5), verbose = TRUE)
#> 2023-05-07 06:01:07.087735 sampleDepth: Calculating sample quantiles
#> 2023-05-07 06:01:07.093398 sampleDepth: Calculating sample adjustments
## Build the models
group <- genomeInfo$pop
adjustvars <- data.frame(genomeInfo$gender)
models <- makeModels(sampleDepths, testvars = group, adjustvars = adjustvars)
## Preprocess the data
## Automatic chunksize used to then compare 1 vs 4 cores in the 'do not run'
## section
prep <- preprocessCoverage(genomeData,
groupInfo = group, cutoff = 0,
scalefac = 32, chunksize = NULL, colsubset = NULL, mc.cores = 4
)
## Get the F statistics
fstats <- genomeFstats
## We recommend determining the cutoff to use based on the F-distribution
## although you could also based it on the observed F-statistics.
## In this example we use a low cutoff used for illustrative purposes
cutoff <- 1
## Calculate the p-values and define the regions of interest.
regsWithP <- calculatePvalues(prep, models, fstats,
nPermute = 1, seeds = 1,
chr = "chr21", cutoff = cutoff, mc.cores = 1, method = "regular"
)
#> 2023-05-07 06:01:07.420015 calculatePvalues: identifying data segments
#> 2023-05-07 06:01:07.423844 findRegions: segmenting information
#> 2023-05-07 06:01:07.427514 findRegions: identifying candidate regions
#> 2023-05-07 06:01:07.472595 findRegions: identifying region clusters
#> 2023-05-07 06:01:07.572004 calculatePvalues: calculating F-statistics for permutation 1 and seed 1
#> 2023-05-07 06:01:07.844803 findRegions: segmenting information
#> 2023-05-07 06:01:07.848173 findRegions: identifying candidate regions
#> 2023-05-07 06:01:07.897063 calculatePvalues: calculating the p-values
#> 2023-05-07 06:01:07.919936 calculatePvalues: skipping q-value calculation.
regsWithP
#> $regions
#> GRanges object with 33 ranges and 14 metadata columns:
#> seqnames ranges strand | value area indexStart
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <integer>
#> up chr21 47409522-47409560 * | 15.05364 587.092 128
#> up chr21 47411924-47411986 * | 7.09670 447.092 618
#> up chr21 47417352-47417397 * | 8.77845 403.808 1292
#> up chr21 47412662-47412724 * | 5.67122 357.287 843
#> up chr21 47408998-47409026 * | 10.68525 309.872 69
#> .. ... ... ... . ... ... ...
#> up chr21 47417666-47417672 * | 1.21552 8.50863 1390
#> up chr21 47409685-47409688 * | 1.21585 4.86342 192
#> up chr21 47417335-47417337 * | 1.59162 4.77487 1275
#> up chr21 47418068 * | 3.99766 3.99766 1434
#> up chr21 47410311-47410312 * | 1.14363 2.28725 246
#> indexEnd cluster clusterL meanCoverage meanCEU meanYRI
#> <integer> <Rle> <Rle> <numeric> <numeric> <numeric>
#> up 166 3 167 0.415219 0.612943 0.0000000
#> up 680 6 389 0.584741 0.837491 0.0539683
#> up 1337 10 338 0.364656 0.533126 0.0108696
#> up 905 7 63 0.501280 0.739985 0.0000000
#> up 97 2 232 0.195773 0.288998 0.0000000
#> .. ... ... ... ... ... ...
#> up 1396 10 338 0.2258065 0.333333 0.0
#> up 195 3 167 0.4758065 0.654762 0.1
#> up 1277 10 338 0.1290323 0.190476 0.0
#> up 1434 11 1 0.0322581 0.047619 0.0
#> up 247 4 159 0.7258065 1.071429 0.0
#> log2FoldChangeYRIvsCEU pvalues significant qvalues significantQval
#> <numeric> <numeric> <factor> <logical> <logical>
#> up -Inf 0.0175439 TRUE <NA> <NA>
#> up -3.95589 0.0175439 TRUE <NA> <NA>
#> up -5.61611 0.0526316 FALSE <NA> <NA>
#> up -Inf 0.1578947 FALSE <NA> <NA>
#> up -Inf 0.1929825 FALSE <NA> <NA>
#> .. ... ... ... ... ...
#> up -Inf 0.859649 FALSE <NA> <NA>
#> up -2.71097 0.894737 FALSE <NA> <NA>
#> up -Inf 0.894737 FALSE <NA> <NA>
#> up -Inf 0.894737 FALSE <NA> <NA>
#> up -Inf 0.929825 FALSE <NA> <NA>
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
#>
#> $nullStats
#> numeric-Rle of length 56 with 54 runs
#> Lengths: 1 1 1 1 ... 1 1 1 1
#> Values : 2.25473 3.63681 4.12769 9.73262 ... 1.16174 8.10124 5.20438 1.16174
#>
#> $nullWidths
#> integer-Rle of length 56 with 48 runs
#> Lengths: 1 1 1 1 2 2 1 1 2 1 1 ... 1 1 1 1 2 1 1 1 1 1
#> Values : 9 20 54 45 27 54 38 21 36 3 8 ... 63 45 36 38 36 3 1 47 39 1
#>
#> $nullPermutation
#> integer-Rle of length 56 with 1 run
#> Lengths: 56
#> Values : 1
#>
if (FALSE) {
## Calculate again, but with 10 permutations instead of just 1
regsWithP <- calculatePvalues(prep, models, fstats,
nPermute = 10, seeds = 1:10,
chr = "chr21", cutoff = cutoff, mc.cores = 2, method = "regular"
)
## Check that they are the same as the previously calculated regions
library(testthat)
expect_that(regsWithP, equals(genomeRegions))
## Histogram of the theoretical p-values by region
hist(pf(regsWithP$regions$value, df1 - df0, n - df1), main = "Distribution
original p-values by region", freq = FALSE)
## Histogram of the permutted p-values by region
hist(regsWithP$regions$pvalues, main = "Distribution permutted p-values by
region", freq = FALSE)
## MA style plot
library("ggplot2")
ma <- data.frame(
mean = regsWithP$regions$meanCoverage,
log2FoldChange = regsWithP$regions$log2FoldChangeYRIvsCEU
)
ggplot(ma, aes(x = log2(mean), y = log2FoldChange)) +
geom_point() +
ylab("Fold Change (log2)") +
xlab("Mean coverage (log2)") +
labs(title = "MA style plot")
## Annotate the results
library("bumphunter")
genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene)
annotation <- matchGenes(regsWithP$regions, genes)
head(annotation)
}
```