Skip to contents

This function creates a gene track visualization from genomic annotations, supporting various input formats including GTF/GFF files, TxDb objects, and data frames. It automatically handles gene structure visualization with exons, introns, and strand information.

By default, when y = "strand", genes are colored by strand: plus strand uses "darkgreen" and minus strand uses "orange2". To use uniform colors instead, explicitly set exon_color, exon_fill, and intron_color.

Usage

ez_gene(
  data,
  region = NULL,
  gene = NULL,
  gene_db = NULL,
  org_db = NULL,
  extend = 0.1,
  extend_type = c("proportion", "bp"),
  exon_height = 0.2,
  intron_width = 0.6,
  exon_color = NULL,
  exon_fill = NULL,
  intron_color = NULL,
  gene_id = "gene_id",
  gene_name = "gene_name",
  y = "strand",
  label = "gene_name",
  label_size = 3,
  label_color = NULL,
  label_style = c("auto", "simple", "repel", "none"),
  max_labels = NULL,
  label_priority = "length",
  repel_args = list(),
  border = FALSE,
  label_chr = TRUE,
  ...
)

Arguments

data

Input data source, which can be:

  • A file path to a GTF/GFF file

  • A TxDb object from the GenomicFeatures package

  • A data frame with gene annotation data

region

Genomic region to display in the format "chr:start-end". Example: "chr1:1000000-2000000". Either region or gene (with gene_db) must be provided.

gene

Gene name/symbol to look up (e.g., "PTPRC", "TP53"). When provided, the region is automatically determined from the gene coordinates in gene_db. Either region or gene must be provided.

gene_db

TxDb object for gene coordinate lookup when using gene parameter. Can be the same as data if data is also a TxDb.

org_db

Optional OrgDb object for gene symbol mapping. If NULL (default), auto-detects available OrgDb packages.

extend

Numeric. Amount to extend the region beyond the gene body when using gene parameter. Default: 0.1 (10% of gene length on each side).

extend_type

How to interpret extend: "proportion" (relative to gene length) or "bp" (absolute base pairs). Default: "proportion".

exon_height

Relative height of exons (0 to 1). Default: 0.2

intron_width

Line width for introns. Default: 0.6

exon_color

Border color for exons. Default: NULL (uses strand-based colors when y = "strand", otherwise "gray50")

exon_fill

Fill color for exons. Default: NULL (uses strand-based colors when y = "strand", otherwise "gray50")

intron_color

Color for intron lines. Default: NULL (uses strand-based colors when y = "strand", otherwise "gray50")

gene_id

Column name for gene identifiers. Default: "gene_id"

gene_name

Column name for gene symbols/names. Default: "gene_name"

y

Column name for the y-axis grouping variable. Default: "strand"

label

Column name to use for text labels. If NULL (default), no labels are displayed. Set to a column name (e.g., "gene_name") to show labels.

label_size

Size of text labels. Default: 3

label_color

Color of text labels. If NULL (default), uses strand-based colors when y = "strand", otherwise uses exon_fill color.

label_style

Strategy for handling overlapping labels. Options:

  • "auto" (default): Uses ggrepel if available, otherwise check_overlap

  • "simple": Standard geom_text with no overlap handling

  • "repel": Force use of ggrepel (errors if not installed)

  • "none": No labels displayed

max_labels

Maximum number of labels to display. NULL (default) shows all. When set, labels are filtered based on label_priority.

label_priority

Priority criterion for filtering labels when max_labels is set. Options: "length" (default, prioritizes longer genes), "name" (alphabetical), or a column name in the data to sort by.

repel_args

Named list of additional arguments passed to geom_text_repel() when label_style = "repel" or "auto" (with ggrepel installed). Default behavior uses horizontal-only repositioning (direction = "x") with no connecting lines (segment.color = NA). To show connecting lines, use list(segment.color = "gray50"). Override direction with list(direction = "both") for vertical repositioning too. Other useful options: max.overlaps, force, box.padding, point.padding.

border

Logical. If TRUE, adds a black border around the plot panel. Default: FALSE

label_chr

Logical. If TRUE (default), labels the x-axis with the chromosome name (e.g., "Chr1"). Set to FALSE to suppress the x-axis label.

...

Additional arguments passed to geom_gene(). Note that color and colour arguments are ignored; use exon_color, exon_fill, and intron_color instead.

Value

A ggplot2 object representing the gene track.

Details

The function automatically processes different input types:

  • For GTF/GFF files: Uses rtracklayer to import and process the data

  • For TxDb objects: Extracts gene models using GenomicFeatures

  • For data frames: Expects columns for chromosome, start, end, strand, and type

The visualization includes:

  • Exons as filled rectangles

  • Introns as connecting lines

  • Strand information with arrowheads

  • Automatic y-axis separation by the specified y variable

Label Overlap Handling:

The function provides flexible strategies for managing overlapping gene labels:

  • label_style = "auto": Automatically uses ggrepel if installed, otherwise applies check_overlap to hide overlapping labels

  • label_style = "simple": Standard text labels with no overlap handling

  • label_style = "repel": Uses ggrepel to reposition labels. By default, labels are repositioned horizontally only (direction = "x") with no connecting lines (segment.color = NA) to maintain a clean appearance while keeping labels horizontally aligned. This can be changed via repel_args.

  • label_style = "none": Disables all labels

When many genes are present, use max_labels to limit the number of labels shown, prioritized by label_priority (gene length by default).

Examples

# From a data frame
data(example_genes)
ez_gene(example_genes, "chr1:11869-14409")
#> Warning: Ignoring unknown parameters: `arrow_length`, `arrow_type`, `exon_colour`,
#> `intron_colour`, and `clip_to_region`
#> Warning: Ignoring unknown aesthetics: fill
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Warning: Vectorized input to `element_text()` is not officially supported.
#>  Results may be unexpected or may change in future versions of ggplot2.
#> Warning: Removed 5 rows containing missing values or values outside the scale range
#> (`geom_text_repel()`).


# Limit labels to top 5 longest genes
ez_gene(example_genes, "chr1:42100000-42700000", max_labels = 5)
#> Warning: Ignoring unknown parameters: `arrow_length`, `arrow_type`, `exon_colour`,
#> `intron_colour`, and `clip_to_region`
#> Warning: Ignoring unknown aesthetics: fill
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Warning: Vectorized input to `element_text()` is not officially supported.
#>  Results may be unexpected or may change in future versions of ggplot2.
#> Warning: Removed 5 rows containing missing values or values outside the scale range
#> (`geom_text_repel()`).


# Use ggrepel for smart label positioning (if installed)
if (FALSE) { # \dontrun{
# Default: horizontal-only repositioning, no connecting lines
ez_gene(example_genes, "chr1:42100000-42700000", label_style = "repel")

# Show connecting lines to original position
ez_gene(example_genes, "chr1:42100000-42700000",
        label_style = "repel",
        repel_args = list(segment.color = "gray50"))

# Allow both horizontal and vertical repositioning
ez_gene(example_genes, "chr1:42100000-42700000",
        label_style = "repel",
        repel_args = list(direction = "both"))

# Custom repel settings for denser regions
ez_gene(example_genes, "chr1:42100000-42700000",
        label_style = "repel",
        repel_args = list(max.overlaps = 30, force = 3))
} # }

# Hide overlapping labels automatically
ez_gene(example_genes, "chr1:42100000-42700000", label_style = "auto")
#> Warning: Ignoring unknown parameters: `arrow_length`, `arrow_type`, `exon_colour`,
#> `intron_colour`, and `clip_to_region`
#> Warning: Ignoring unknown aesthetics: fill
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Warning: Vectorized input to `element_text()` is not officially supported.
#>  Results may be unexpected or may change in future versions of ggplot2.
#> Warning: Removed 5 rows containing missing values or values outside the scale range
#> (`geom_text_repel()`).


# No labels
ez_gene(example_genes, "chr1:11869-14409", label_style = "none")
#> Warning: Ignoring unknown parameters: `arrow_length`, `arrow_type`, `exon_colour`,
#> `intron_colour`, and `clip_to_region`
#> Warning: Ignoring unknown aesthetics: fill
#> Warning: Vectorized input to `element_text()` is not officially supported.
#>  Results may be unexpected or may change in future versions of ggplot2.


if (FALSE) { # \dontrun{
# Using gene name for region lookup
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
ez_gene(txdb, gene = "PTPRC", gene_db = txdb)
} # }