Flow Cytometry Data Analysis in R: Free Step-by-Step Tutorial

You don't need FlowJo, FCS Express, or any paid software to do serious flow cytometry data analysis. R gives you a complete, reproducible, free workflow — from loading your FCS files to publication-ready figures. This tutorial walks you through every step with real code.

What you'll learn: Install the right packages → load FCS files → QC → compensation → transformation → gating → extract statistics → visualize results. All free, all in R.

Why Analyze Flow Cytometry Data in R?

The standard tools (FlowJo, FCS Express) are powerful but expensive — licenses can cost thousands of dollars per year. R offers a fully functional alternative that is free, open-source, and crucially, reproducible. Every gate, every transformation, every statistical test is written in code you can share, version-control, and re-run on new data.

For research labs with multiple users, R also scales much better: a single script can process hundreds of FCS files overnight with no manual intervention. That's not something you can do clicking through a GUI.

Step 1 — Install the Essential Packages

All packages you need are available through Bioconductor, which is the standard repository for bioinformatics R packages.

# Install Bioconductor manager once
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

# Core flow cytometry packages
BiocManager::install(c(
  "flowCore",      # read/write FCS files, core data structures
  "flowViz",       # base plotting for flow data
  "ggcyto",        # ggplot2-based visualization
  "openCyto",      # automated gating pipelines
  "flowWorkspace", # gating set management
  "CytoExploreR"  # interactive gating interface
))

flowCoreFoundation — read FCS files, transformations, basic statistics

ggcytoggplot2 extension — beautiful, publication-ready flow plots

openCytoAutomated gating using data-driven algorithms

flowWorkspaceManage gating hierarchies and populations

CytoExploreRInteractive gating — closest thing to FlowJo in R

cytofastOur package — fast visualization of FlowSOM results

Step 2 — Load Your FCS Files

The flowCore package handles FCS 2.0, 3.0, and 3.1 files. The basic unit is a flowFrame (one FCS file) or a flowSet (multiple files).

library(flowCore)

# Load a single FCS file
ff <- read.FCS("sample_001.fcs", transformation = FALSE)

# See what's inside
summary(ff)
colnames(ff)       # channel names (e.g., "FITC-A", "PE-A")
pData(parameters(ff))  # full channel descriptions

# Load all FCS files from a folder into a flowSet
fs <- read.flowSet(
  path = "./fcs_files/",
  pattern = "*.fcs",
  transformation = FALSE
)

length(fs)          # number of samples
sampleNames(fs)    # file names

Important: always load with transformation = FALSE first. Apply transformations yourself in a controlled way — letting flowCore auto-transform can give unexpected results with some instruments.

Step 3 — Quality Control

Before any analysis, remove debris, dead cells, and time-based anomalies (instrument clogs, pressure drops). The flowAI or PeacoQC packages automate this, but a manual check is always useful first.

library(ggcyto)

# Quick look at FSC vs SSC to assess overall quality
autoplot(fs[[1]], x = "FSC-A", y = "SSC-A", bins = 64)

# Time-based QC: check for signal drift
autoplot(fs[[1]], x = "Time", y = "FSC-A")

# Automated QC with flowAI
BiocManager::install("flowAI")
library(flowAI)
fs_clean <- flow_auto_qc(fs, output = 1)

Step 4 — Compensation

Spectral spillover is unavoidable in multicolor flow cytometry. If your FCS files already contain a compensation matrix embedded by the cytometer software, you can apply it directly. Otherwise, you calculate it from single-stain controls.

# Extract the embedded compensation matrix
comp_matrix <- keyword(fs[[1]])"$SPILL"]
comp <- compensation(comp_matrix)

# Apply to the whole flowSet
fs_comp <- compensate(fs_clean, comp)

# If you have single-stain controls, compute from scratch
# spillover() estimates the matrix from single-color tubes
comp_manual <- spillover(
  x = single_stains_flowset,
  unstained = unstained_sample,
  patt = "*.fcs",
  fsc = "FSC-A",
  ssc = "SSC-A",
  plot = TRUE
)

Step 5 — Transformation

Flow cytometry data is never analyzed on a linear scale. Fluorescence intensities span several decades and include negative values after compensation. The two standard transformations are logicle (biexponential) and arcsinh.

# Logicle transform — the standard for flow cytometry
# estimateLogicle() picks optimal parameters per channel
trans <- estimateLogicle(fs_comp[[1]], channels = colnames(fs_comp))
fs_trans <- transform(fs_comp, trans)

# Arcsinh — preferred for mass cytometry (CyTOF), cofactor = 5
asinhTrans <- arcsinhTransform(transformationId = "arcsinh", a = 0, b = 1/5, c = 0)
channels <- colnames(fs_comp)[4:30]  # adjust to your channel range
translist <- transformList(channels, asinhTrans)
fs_trans <- transform(fs_comp, translist)

Step 6 — Gating

Gating is where most of the analytical decisions happen. In R you can gate manually (by defining polygon or rectangle gates), use statistical methods, or use fully automated pipelines via openCyto.

Manual gating with flowWorkspace

library(flowWorkspace)
library(ggcyto)

# Create a GatingSet from your flowSet
gs <- GatingSet(fs_trans)

# Define a lymphocyte gate (FSC/SSC)
lymph_gate <- polygonGate(
  filterId = "Lymphocytes",
  .gate = matrix(c(
    50000, 20000,   # FSC-A, SSC-A corner 1
    200000, 20000,  # corner 2
    200000, 80000,  # corner 3
    50000, 80000    # corner 4
  ), ncol = 2, dimnames = list(NULL, c("FSC-A", "SSC-A")))
)

# Add gate to the gating hierarchy
gs_pop_add(gs, lymph_gate, parent = "root")
recompute(gs)

# Gate on CD3+CD4+ T cells from lymphocyte parent
cd4_gate <- rectangleGate(
  filterId = "CD4 T cells",
  "BV421-A" = c(2, Inf),   # CD3
  "PE-A" = c(2, Inf)        # CD4
)
gs_pop_add(gs, cd4_gate, parent = "Lymphocytes")
recompute(gs)

Automated gating with openCyto

library(openCyto)

# openCyto uses a gating template (CSV) to define the hierarchy
# then applies data-driven algorithms to each gate

gt <- gatingTemplate("gating_template.csv")
gt_gating(gt, gs)

# The template CSV looks like:
# alias, pop, parent, dims, gating_method, gating_args
# Lymphocytes, +, root, FSC-A:SSC-A, flowClust.2d, K=1
# Live, +, Lymphocytes, Viability-A, mindensity,
# CD4 T cells, +, Live, CD3-A:CD4-A, cytokine,

Step 7 — Extract Cell Population Statistics

Once your gating hierarchy is built, extracting counts and percentages is a single function call.

# Get population statistics for all samples
stats <- gs_pop_get_stats(gs, type = "percent")
head(stats)

# Get absolute counts
counts <- gs_pop_get_stats(gs, type = "count")

# Reshape to wide format for downstream stats
library(tidyr)
library(dplyr)

stats_wide <- stats |>
  pivot_wider(names_from = pop, values_from = percent) |>
  left_join(sample_metadata, by = "name")

# Now run your statistics
wilcox.test(stats_wide$`CD4 T cells` ~ stats_wide$group)

Step 8 — Visualization

The ggcyto package extends ggplot2 to understand flow cytometry data structures. You get the full power of the ggplot2 grammar — facets, themes, scales — applied directly to your gating sets.

library(ggcyto)

# Overlay gate on a dot plot
autoplot(gs, gate = "Lymphocytes", x = "FSC-A", y = "SSC-A",
         bins = 64, strip.text = "gate")

# Multi-panel: one panel per sample, CD3 vs CD4
ggcyto(gs, aes(x = "CD3-A", y = "CD4-A"), subset = "Lymphocytes") +
  geom_hex(bins = 64) +
  geom_gate("CD4 T cells") +
  geom_stats() +
  scale_x_logicle() + scale_y_logicle() +
  facet_wrap(~name) +
  theme_bw()

# Box plot comparing populations across groups
library(ggplot2)
ggplot(stats_wide, aes(x = group, y = `CD4 T cells`, fill = group)) +
  geom_boxplot(alpha = 0.7) +
  geom_jitter(width = 0.1) +
  labs(y = "CD4+ T cells (%)", x = NULL) +
  theme_bw()

Going Further: CytoFAST for High-Throughput Visualization

When you work with FlowSOM clustering results and need to visualize many clusters across many samples quickly, CytoFAST was built exactly for that. It reads FlowSOM output and produces heatmaps and summary plots in seconds — something that becomes critical when you're analyzing 50+ FCS files simultaneously.

# CytoFAST — rapid visualization of FlowSOM cluster results
install.packages("cytofast")
library(cytofast)

# Read FlowSOM cluster labels
cfList <- readCytof(
  ffList = ff_list,
  clust = "SOM_label",
  samples = "sampleID",
  ...
)

# Summary heatmap: clusters × markers
cytoHeat(cfList, key = "clust", legend = TRUE)

# Stacked bar chart: cluster frequencies per sample
cytoBar(cfList)

Recommended Workflow Summary

Step	Package	Key function
Load FCS files	flowCore	`read.flowSet()`
Quality control	flowAI / PeacoQC	`flow_auto_qc()`
Compensation	flowCore	`compensate()`
Transformation	flowCore	`estimateLogicle()`
Gating	flowWorkspace / openCyto	`gs_pop_add()`, `gt_gating()`
Statistics	flowWorkspace	`gs_pop_get_stats()`
Visualization	ggcyto	`ggcyto()`, `autoplot()`
Clustering	FlowSOM / cytofast	`FlowSOM()`, `cytoHeat()`

Common Pitfalls and How to Avoid Them

Applying transformation before compensation — always compensate first, then transform. Transforming first distorts the spillover calculation.
Using the same logicle parameters for all samples — estimateLogicle() per sample gives better results than a fixed parameter set, especially if acquisition conditions varied.
Forgetting to recompute after adding gates — always call recompute(gs) after modifying the gating hierarchy, or your statistics will be stale.
Not setting a random seed before clustering — FlowSOM uses random initialization. Set set.seed(42) before running to ensure reproducibility.
Excluding negative events entirely — negative values after compensation are real data, not artifacts. The logicle transform handles them correctly; don't gate them out.

Resources and Further Reading

flowCore documentation on Bioconductor
Hahne et al. (2009) — "flowCore: A Bioconductor package for high throughput flow cytometry" — BMC Bioinformatics
Van Gassen et al. (2015) — "FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data" — Cytometry Part A
Nowicka et al. (2019) — "CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets" — F1000Research
Clustering algorithms for mass cytometry — FlowSOM, PhenoGraph, UMAP comparison
Using Claude Code in RStudio — speed up your R analysis with AI assistance

← All Articles Clustering Algorithms for Mass Cytometry →