Skip to contents

Creates a uniform grid of bins across the specified region, computes signal scores per bin for each input sample, and then summarizes (e.g. mean) across all samples. This solves the problem of different bigWig/bedGraph files having different internal bin boundaries, which makes direct cross-sample arithmetic impossible without a shared coordinate system.

Usage

average_signal(
  inputs,
  region,
  bin_width = 50,
  summary_fun = c("mean", "median", "max", "min", "sum"),
  nans_to_zeros = TRUE
)

Arguments

inputs

A character vector of file paths (bigWig, bedGraph, etc.) or a list of data frames, each with columns seqnames, start, end, score.

region

A genomic region string ("chr:start-end") or a GRanges object.

bin_width

Bin size in base pairs (default: 50).

summary_fun

Summary function to apply across samples. One of "mean", "median", "max", "min", "sum" (default: "mean").

nans_to_zeros

Logical. Convert NaN/NA values to zero before summarizing (default: TRUE). Recommended for bigWig files that may have missing data in some regions.

Value

A data frame with columns seqnames, start, end, score representing the summarized signal on the uniform grid.

Details

For bigWig inputs, signals are queried efficiently via rtracklayer::summary() which computes per-bin statistics directly from the indexed file. For data frame inputs, an overlap-based weighted average is calculated against the uniform bins.

Examples

if (FALSE) { # \dontrun{
# Average two bigWig files
avg_df <- average_signal(
  c("sample1.bw", "sample2.bw"),
  region = "chr1:1000000-2000000",
  bin_width = 100
)

# Use with ez_coverage for plotting
ez_coverage(avg_df, "chr1:1000000-2000000")

# Average data frames
avg_df <- average_signal(
  list(df1, df2, df3),
  region = "chr1:1000000-2000000",
  summary_fun = "median"
)
} # }