This function looks up a gene by name in a TxDb object and returns a region string representing the gene body with optional padding. This allows users to specify genes by name instead of coordinates when using ez_* functions.
Usage
gene_to_region(
gene_name,
txdb,
org_db = NULL,
extend = 0.1,
extend_type = c("proportion", "bp")
)Arguments
- gene_name
Character string. Gene symbol to look up (e.g., "PTPRC", "TP53").
- txdb
A TxDb object containing gene annotations (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene).
- org_db
Optional OrgDb object for mapping gene symbols to ENTREZ IDs. If NULL (default), auto-detects available OrgDb packages.
- extend
Numeric. Amount to extend the region beyond the gene body. Interpretation depends on
extend_type. Default: 0.1 (10% of gene length).- extend_type
Character. How to interpret
extend:"proportion":
extendis a proportion of gene length (default)"bp":
extendis an absolute number of base pairs
Details
The function maps gene symbols to ENTREZ IDs using the OrgDb, then queries the TxDb for gene coordinates. If multiple genes match (e.g., same symbol on different chromosomes), a warning is issued and the first match (sorted by chromosome, then start position) is used.
The padding extends the region on both sides of the gene body. For example,
with extend = 0.1 (default) and a gene of length 10kb, the region will
extend 1kb upstream and 1kb downstream (total region: 12kb).
Examples
if (FALSE) { # \dontrun{
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)
# Get region for PTPRC gene with 10% padding (default)
region <- gene_to_region("PTPRC", TxDb.Hsapiens.UCSC.hg38.knownGene)
# Use in ez_coverage
ez_coverage(signal_data, region)
# With 5kb fixed padding on each side
region <- gene_to_region("TP53", TxDb.Hsapiens.UCSC.hg38.knownGene,
extend = 5000, extend_type = "bp")
} # }