Average signal across multiple samples on a common binned grid
Source:R/helpers_averaging.R
average_signal.RdCreates a uniform grid of bins across the specified region, computes signal scores per bin for each input sample, and then summarizes (e.g. mean) across all samples. This solves the problem of different bigWig/bedGraph files having different internal bin boundaries, which makes direct cross-sample arithmetic impossible without a shared coordinate system.
Usage
average_signal(
inputs,
region,
bin_width = 50,
summary_fun = c("mean", "median", "max", "min", "sum"),
nans_to_zeros = TRUE
)Arguments
- inputs
A character vector of file paths (bigWig, bedGraph, etc.) or a list of data frames, each with columns
seqnames,start,end,score.- region
A genomic region string ("chr:start-end") or a GRanges object.
- bin_width
Bin size in base pairs (default: 50).
- summary_fun
Summary function to apply across samples. One of
"mean","median","max","min","sum"(default:"mean").- nans_to_zeros
Logical. Convert NaN/NA values to zero before summarizing (default: TRUE). Recommended for bigWig files that may have missing data in some regions.
Value
A data frame with columns seqnames, start, end, score
representing the summarized signal on the uniform grid.
Details
For bigWig inputs, signals are queried efficiently via
rtracklayer::summary() which computes per-bin statistics directly from the
indexed file. For data frame inputs, an overlap-based weighted average is
calculated against the uniform bins.
Examples
if (FALSE) { # \dontrun{
# Average two bigWig files
avg_df <- average_signal(
c("sample1.bw", "sample2.bw"),
region = "chr1:1000000-2000000",
bin_width = 100
)
# Use with ez_coverage for plotting
ez_coverage(avg_df, "chr1:1000000-2000000")
# Average data frames
avg_df <- average_signal(
list(df1, df2, df3),
region = "chr1:1000000-2000000",
summary_fun = "median"
)
} # }