Skip to contents

This function looks up a gene by name in a TxDb object and returns a region string representing the gene body with optional padding. This allows users to specify genes by name instead of coordinates when using ez_* functions.

Usage

gene_to_region(
  gene_name,
  txdb,
  org_db = NULL,
  extend = 0.1,
  extend_type = c("proportion", "bp")
)

Arguments

gene_name

Character string. Gene symbol to look up (e.g., "PTPRC", "TP53").

txdb

A TxDb object containing gene annotations (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene).

org_db

Optional OrgDb object for mapping gene symbols to ENTREZ IDs. If NULL (default), auto-detects available OrgDb packages.

extend

Numeric. Amount to extend the region beyond the gene body. Interpretation depends on extend_type. Default: 0.1 (10% of gene length).

extend_type

Character. How to interpret extend:

  • "proportion": extend is a proportion of gene length (default)

  • "bp": extend is an absolute number of base pairs

Value

A character string in region format "chr:start-end" suitable for use with ez_* functions.

Details

The function maps gene symbols to ENTREZ IDs using the OrgDb, then queries the TxDb for gene coordinates. If multiple genes match (e.g., same symbol on different chromosomes), a warning is issued and the first match (sorted by chromosome, then start position) is used.

The padding extends the region on both sides of the gene body. For example, with extend = 0.1 (default) and a gene of length 10kb, the region will extend 1kb upstream and 1kb downstream (total region: 12kb).

Examples

if (FALSE) { # \dontrun{
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)

# Get region for PTPRC gene with 10% padding (default)
region <- gene_to_region("PTPRC", TxDb.Hsapiens.UCSC.hg38.knownGene)

# Use in ez_coverage
ez_coverage(signal_data, region)

# With 5kb fixed padding on each side
region <- gene_to_region("TP53", TxDb.Hsapiens.UCSC.hg38.knownGene,
                         extend = 5000, extend_type = "bp")
} # }