This function creates a Manhattan plot from GWAS (Genome-Wide Association Study) data, which is a standard way to visualize p-values across the genome. Supports both genome-wide and regional (LocusZoom-style) modes with automatic detection based on data content or explicit region specification.
Usage
ez_manhattan(
input,
region = NULL,
chr = NULL,
bp = NULL,
p = NULL,
snp = NULL,
track_labels = NULL,
group_var = NULL,
logp = TRUE,
size = 0.5,
color = "grey50",
lead_snp = NULL,
r2 = NULL,
colors = NULL,
highlight_snps = NULL,
highlight_color = "purple",
threshold_p = NULL,
threshold_color = "red",
threshold_linetype = 2,
color_by = "auto",
y_axis_style = c("none", "simple", "full"),
y_axis_label = expression(paste("-log"[10], "(P)")),
facet_label_position = c("top", "left"),
...
)Arguments
- input
A data frame or named list of data frames containing GWAS results with columns for chromosome, position, p-values, and optionally SNP names. Supports both GWAS-style (CHR, BP, P) and GRanges-style (seqnames, start, pvalue) column naming conventions.
- region
Optional genomic region string (e.g., "chr1:1000000-2000000") to force regional mode. When provided, data is filtered to this region and the plot uses coordinate-based x-axis consistent with
ez_coverageandez_gene.- chr
Character string specifying the column name for chromosome numbers. Default: "CHR". Also accepts "seqnames", "chrom", etc.
- bp
Character string specifying the column name for base pair positions. Default: "BP". Also accepts "start", "pos", "position", etc.
- p
Character string specifying the column name for p-values. Default: "P". Also accepts "pvalue", "p.value", etc.
- snp
Character string specifying the column name for SNP identifiers. Default: "SNP". Also accepts "rsid", "variant_id", etc.
- track_labels
Optional vector of track labels (used for unnamed list input). Default: NULL.
- group_var
Column name for grouping data within a single data frame. Default: NULL.
- logp
Logical indicating whether to plot -log10(p-values). Default: TRUE.
- size
Numeric value for point size in the plot. Default: 0.5.
- color
Default point color for regional mode when color_by is not "r2". Default: "grey50".
- lead_snp
Character string or vector of SNP IDs to highlight as the lead variant(s). Default: NULL.
- r2
Numeric vector of r² values for coloring points by linkage disequilibrium (LD) with lead variant. Must be same length as number of rows in data. Default: NULL.
- colors
Vector of colors for coloring points. Usage depends on
color_by:For discrete columns: colors are recycled/mapped to factor levels
For continuous columns: colors define a gradient (default: viridis-like palette)
For multi-track or grouped plots: colors for each track/group Default: NULL (appropriate defaults are chosen automatically).
- highlight_snps
Character vector of SNP IDs to highlight. Default: NULL.
- highlight_color
Color for highlighting significant or lead SNPs. Default: "purple".
- threshold_p
Numeric p-value threshold for drawing a significance line. If NULL, no line is drawn. Default: NULL.
- threshold_color
Color for the significance threshold line. Default: "red".
- threshold_linetype
Linetype for the significance threshold line. Default: 2 (dashed).
- color_by
How points should be colored. Can be:
A column name in the data (e.g., "CHR", "gene", "maf"): Colors by that column. Use
colorsto specify a custom palette. For chromosome coloring, usecolor_by = "CHR"(or your chr column name) withcolors = c("grey", "skyblue")."r2": LD-based gradient coloring (requires
r2parameter)"none": Single color specified by
colorparameter"auto" (default): Uses "r2" if
r2is provided, otherwise "none" Note: In grouped/multi-track plots, color_by is handled differently.
- y_axis_style
Y-axis style: "none", "simple", or "full" (default: "none"). Only applies in regional mode.
- y_axis_label
Label for the y-axis. Default:
expression(paste("-log"[10], "(P)")).- facet_label_position
Position of facet labels: "top" or "left" (default: "top")
- ...
Additional arguments passed to
geom_manhattan().
Details
This function creates a Manhattan plot for GWAS results. It is a wrapper around geom_manhattan that provides a flexible interface with support for grouping and multiple tracks.
The function creates a Manhattan plot with chromosomes on the x-axis and -log10(p-values) on the y-axis. The plot mode is automatically determined:
Regional mode: When
regionis provided OR when data contains only one chromosome, the plot uses genomic coordinate formatting consistent withez_coverageandez_gene, making it suitable for stacking with other tracks viavstack_plot(). This is ideal for LocusZoom-style regional association plots.Genome-wide mode: When data contains multiple chromosomes and no region is specified, chromosomes are displayed with alternating colors and cumulative positions.
For LD-based coloring (LocusZoom style), provide r2 values and set
color_by = "r2".
For multiple tracks (via named list), plots are stacked vertically using facets. For grouped data (via group_var), colors distinguish different groups within tracks.
Examples
# Basic genome-wide Manhattan plot
df <- data.frame(
CHR = rep(1:3, each = 20),
BP = rep(1:20, 3) * 1000,
P = runif(60, 0.0001, 1),
SNP = paste0("rs", 1:60)
)
ez_manhattan(df)