This function processes Hi-C data from various input formats for visualization with geom_hic or geom_hic_triangle. It handles dense matrices, sparse data frames, and file paths.
Usage
process_hic_data(
data,
region = NULL,
resolution = 10000,
upper_triangle = FALSE,
symmetric = TRUE
)Arguments
- data
Input data. Can be:
A matrix: Dense contact matrix where rows and columns represent bins
A data frame with columns (bin1, bin2, score) or (pos1, pos2, score): Sparse format
A file path: Tab-delimited matrix file with row/column headers
- region
Genomic region to display (e.g., "chr1:1000000-2000000"). Required for file input, optional for data frame/matrix if coordinates are already genomic.
- resolution
Resolution of the Hi-C data in base pairs (default: 10000). Used to convert bin indices to genomic coordinates for matrix input.
- upper_triangle
Logical. If TRUE, only return upper triangle (pos1 <= pos2). Useful for triangle visualization. Default: FALSE
- symmetric
Logical. If TRUE and data is a matrix, assume it's symmetric and extract upper triangle. Default: TRUE
Value
A data frame with columns:
pos1: Genomic position of first bin
pos2: Genomic position of second bin
score: Contact frequency/count
Examples
if (FALSE) { # \dontrun{
# From a matrix
mat <- matrix(runif(100), nrow = 10)
hic_df <- process_hic_data(mat, "chr1:1000000-1100000", resolution = 10000)
# From a sparse data frame
sparse_df <- data.frame(pos1 = c(1e6, 1e6), pos2 = c(1e6, 1.01e6), score = c(100, 50))
hic_df <- process_hic_data(sparse_df)
# From a file
hic_df <- process_hic_data("contacts.matrix", "chr1:1000000-2000000")
} # }