Skip to contents

This function processes Hi-C data from various input formats for visualization with geom_hic or geom_hic_triangle. It handles dense matrices, sparse data frames, and file paths.

Usage

process_hic_data(
  data,
  region = NULL,
  resolution = 10000,
  upper_triangle = FALSE,
  symmetric = TRUE
)

Arguments

data

Input data. Can be:

  • A matrix: Dense contact matrix where rows and columns represent bins

  • A data frame with columns (bin1, bin2, score) or (pos1, pos2, score): Sparse format

  • A file path: Tab-delimited matrix file with row/column headers

region

Genomic region to display (e.g., "chr1:1000000-2000000"). Required for file input, optional for data frame/matrix if coordinates are already genomic.

resolution

Resolution of the Hi-C data in base pairs (default: 10000). Used to convert bin indices to genomic coordinates for matrix input.

upper_triangle

Logical. If TRUE, only return upper triangle (pos1 <= pos2). Useful for triangle visualization. Default: FALSE

symmetric

Logical. If TRUE and data is a matrix, assume it's symmetric and extract upper triangle. Default: TRUE

Value

A data frame with columns:

  • pos1: Genomic position of first bin

  • pos2: Genomic position of second bin

  • score: Contact frequency/count

Examples

if (FALSE) { # \dontrun{
# From a matrix
mat <- matrix(runif(100), nrow = 10)
hic_df <- process_hic_data(mat, "chr1:1000000-1100000", resolution = 10000)

# From a sparse data frame
sparse_df <- data.frame(pos1 = c(1e6, 1e6), pos2 = c(1e6, 1.01e6), score = c(100, 50))
hic_df <- process_hic_data(sparse_df)

# From a file
hic_df <- process_hic_data("contacts.matrix", "chr1:1000000-2000000")
} # }