Skip to contents

This function creates a Manhattan plot from GWAS (Genome-Wide Association Study) data, which is a standard way to visualize p-values across the genome. Supports both genome-wide and regional (LocusZoom-style) modes with automatic detection based on data content or explicit region specification.

Usage

ez_manhattan(
  input,
  region = NULL,
  chr = NULL,
  bp = NULL,
  p = NULL,
  snp = NULL,
  track_labels = NULL,
  group_var = NULL,
  logp = TRUE,
  size = 0.5,
  color = "grey50",
  lead_snp = NULL,
  r2 = NULL,
  colors = NULL,
  highlight_snps = NULL,
  highlight_color = "purple",
  threshold_p = NULL,
  threshold_color = "red",
  threshold_linetype = 2,
  color_by = "auto",
  y_axis_style = c("none", "simple", "full"),
  y_axis_label = expression(paste("-log"[10], "(P)")),
  facet_label_position = c("top", "left"),
  ...
)

Arguments

input

A data frame or named list of data frames containing GWAS results with columns for chromosome, position, p-values, and optionally SNP names. Supports both GWAS-style (CHR, BP, P) and GRanges-style (seqnames, start, pvalue) column naming conventions.

region

Optional genomic region string (e.g., "chr1:1000000-2000000") to force regional mode. When provided, data is filtered to this region and the plot uses coordinate-based x-axis consistent with ez_coverage and ez_gene.

chr

Character string specifying the column name for chromosome numbers. Default: "CHR". Also accepts "seqnames", "chrom", etc.

bp

Character string specifying the column name for base pair positions. Default: "BP". Also accepts "start", "pos", "position", etc.

p

Character string specifying the column name for p-values. Default: "P". Also accepts "pvalue", "p.value", etc.

snp

Character string specifying the column name for SNP identifiers. Default: "SNP". Also accepts "rsid", "variant_id", etc.

track_labels

Optional vector of track labels (used for unnamed list input). Default: NULL.

group_var

Column name for grouping data within a single data frame. Default: NULL.

logp

Logical indicating whether to plot -log10(p-values). Default: TRUE.

size

Numeric value for point size in the plot. Default: 0.5.

color

Default point color for regional mode when color_by is not "r2". Default: "grey50".

lead_snp

Character string or vector of SNP IDs to highlight as the lead variant(s). Default: NULL.

r2

Numeric vector of r² values for coloring points by linkage disequilibrium (LD) with lead variant. Must be same length as number of rows in data. Default: NULL.

colors

Vector of colors for coloring points. Usage depends on color_by:

  • For discrete columns: colors are recycled/mapped to factor levels

  • For continuous columns: colors define a gradient (default: viridis-like palette)

  • For multi-track or grouped plots: colors for each track/group Default: NULL (appropriate defaults are chosen automatically).

highlight_snps

Character vector of SNP IDs to highlight. Default: NULL.

highlight_color

Color for highlighting significant or lead SNPs. Default: "purple".

threshold_p

Numeric p-value threshold for drawing a significance line. If NULL, no line is drawn. Default: NULL.

threshold_color

Color for the significance threshold line. Default: "red".

threshold_linetype

Linetype for the significance threshold line. Default: 2 (dashed).

color_by

How points should be colored. Can be:

  • A column name in the data (e.g., "CHR", "gene", "maf"): Colors by that column. Use colors to specify a custom palette. For chromosome coloring, use color_by = "CHR" (or your chr column name) with colors = c("grey", "skyblue").

  • "r2": LD-based gradient coloring (requires r2 parameter)

  • "none": Single color specified by color parameter

  • "auto" (default): Uses "r2" if r2 is provided, otherwise "none" Note: In grouped/multi-track plots, color_by is handled differently.

y_axis_style

Y-axis style: "none", "simple", or "full" (default: "none"). Only applies in regional mode.

y_axis_label

Label for the y-axis. Default: expression(paste("-log"[10], "(P)")).

facet_label_position

Position of facet labels: "top" or "left" (default: "top")

...

Additional arguments passed to geom_manhattan().

Value

A ggplot2 object containing the Manhattan plot.

Details

This function creates a Manhattan plot for GWAS results. It is a wrapper around geom_manhattan that provides a flexible interface with support for grouping and multiple tracks.

The function creates a Manhattan plot with chromosomes on the x-axis and -log10(p-values) on the y-axis. The plot mode is automatically determined:

  • Regional mode: When region is provided OR when data contains only one chromosome, the plot uses genomic coordinate formatting consistent with ez_coverage and ez_gene, making it suitable for stacking with other tracks via vstack_plot(). This is ideal for LocusZoom-style regional association plots.

  • Genome-wide mode: When data contains multiple chromosomes and no region is specified, chromosomes are displayed with alternating colors and cumulative positions.

For LD-based coloring (LocusZoom style), provide r2 values and set color_by = "r2".

For multiple tracks (via named list), plots are stacked vertically using facets. For grouped data (via group_var), colors distinguish different groups within tracks.

Examples

# Basic genome-wide Manhattan plot
df <- data.frame(
  CHR = rep(1:3, each = 20),
  BP = rep(1:20, 3) * 1000,
  P = runif(60, 0.0001, 1),
  SNP = paste0("rs", 1:60)
)
ez_manhattan(df)