Meta-Analysis for Beginners: Understanding Effect Sizes Across Studies
Meta-analysis is the statistical half of a systematic review with meta-analysis: the part where extracted effect sizes are pooled into a summary estimate. Students often find it intimidating, and fair enough — the math is dense. But the conceptual logic is accessible, and most applied meta-analyses use a handful of techniques well. This primer covers what you need to understand before running one or reading one critically.
What meta-analysis actually does
Meta-analysis estimates a summary effect by taking a weighted average of effect sizes across studies. Studies with more precision (larger samples, less variance) get more weight. The output is a single pooled effect size with a confidence interval, supplemented by heterogeneity statistics and usually a forest plot.
See our meta-analysis review type guide for context on when to choose this method. The canonical methodological reference is the Cochrane Handbook (Part 2, Chapter 10) and Borenstein, Hedges, Higgins, and Rothstein's Introduction to Meta-Analysis (2009).
Effect sizes: the raw material
A meta-analysis needs a comparable effect size from every study. Common effect sizes:
- Standardized mean difference (SMD) — Cohen's d, Hedges' g. Used for continuous outcomes measured on different scales.
- Mean difference (MD) — for continuous outcomes on the same scale (e.g., all studies used the Maslach Burnout Inventory).
- Risk ratio (RR) and odds ratio (OR) — for dichotomous outcomes.
- Hazard ratio (HR) — for time-to-event outcomes.
- Correlation (r, Fisher's z) — for association studies.
Use one effect size type per meta-analysis. If studies report different metrics, convert to a common one using published formulas (Borenstein et al., 2009).
Fixed-effect vs random-effects models
A fundamental choice:
- Fixed-effect model assumes one true effect size. Any difference between studies is sampling error. Rarely justified in applied research.
- Random-effects model assumes each study estimates a slightly different true effect drawn from a distribution. Acknowledges real heterogeneity.
Default to random effects. Fixed-effect is appropriate only for homogeneous sets of very similar studies — uncommon outside tightly-controlled trial programs.
Heterogeneity
Studies always differ. The question is how much. Key statistics:
- Q (Cochran's Q) — a significance test for heterogeneity. Underpowered in small meta-analyses.
- I² — the proportion of total variation attributable to between-study heterogeneity rather than sampling error. 0% means perfectly homogeneous; above 50% suggests substantial heterogeneity.
- τ² (tau-squared) — the between-study variance. Interpret on the scale of the effect size.
Cochrane rules of thumb: I² under 40% may be "not important"; 30–60% "moderate"; 50–90% "substantial"; above 75% "considerable." These are guides, not rules.
High heterogeneity means your pooled estimate may be misleading. Consider subgroup analyses, meta-regression, or narrative synthesis (per SWiM).
The forest plot
A forest plot visually summarizes a meta-analysis:
- Each row is a study, showing its effect size and confidence interval
- Study weight is often shown as a grey square sized by weight
- The diamond at the bottom shows the pooled effect and its CI
- A line of no effect (0 for MD/SMD, 1 for RR/OR) runs vertically
A good forest plot is the single clearest artifact a meta-analysis produces. Read one carefully: note the spread of effect sizes, the overlap of confidence intervals, and the position of the pooled diamond relative to zero.
Publication bias
Negative and null results are under-published, inflating pooled effects. Assess publication bias with:
- Funnel plots — effect size vs standard error; asymmetry suggests bias
- Egger's regression test — formal test for funnel asymmetry
- Trim-and-fill — imputes missing studies and recalculates the estimate
Publication bias is real and often non-trivial. Always search grey literature (see our grey literature post) and contact authors for unpublished data.
Software
- RevMan (Cochrane) — free for Cochrane reviews; limited statistics
- R: meta, metafor packages — free, flexible, what most methodologists use
- Stata: meta suite — clean syntax, widely available
- CMA (Comprehensive Meta-Analysis) — paid, GUI-driven, good for beginners
For a doctoral student learning meta-analysis, R with metafor is the best long-term investment.
Reporting your meta-analysis
PRISMA 2020 items 13–16 cover synthesis reporting. You must report:
- Effect measure (item 13)
- Methods for handling missing data (item 13e)
- Statistical synthesis methods (item 13d) — model, software, heterogeneity statistics
- Subgroup and sensitivity analyses (item 14)
- Assessment of publication bias (item 14c)
- Certainty of evidence, typically GRADE (item 15)
The forest plot goes in the main text; the funnel plot and subgroup plots as supplements.
Five beginner mistakes
- Pooling when you should not — studies too clinically diverse, I² over 75%, incompatible outcome definitions
- Using fixed-effect by default
- Ignoring publication bias
- Running a meta-analysis with three studies (technically possible; usually uninformative)
- Reporting the pooled estimate without its confidence interval
Before you run it
Ask: is pooling scientifically reasonable? If studies differ substantially in population, intervention dose, or outcome measure, pooling produces a number that does not describe any real-world situation. Narrative synthesis with SWiM may serve better.
Once you have clean extracted data, running a random-effects meta-analysis in metafor takes twenty minutes. Getting to clean extracted data — through protocol, screening, and extraction — is the 95%.