You don't need FlowJo, FCS Express, or any paid software to do serious flow cytometry data analysis. R gives you a complete, reproducible, free workflow — from loading your FCS files to publication-ready figures. This tutorial walks you through every step with real code.
What you'll learn: Install the right packages → load FCS files → QC → compensation → transformation → gating → extract statistics → visualize results. All free, all in R.
Why Analyze Flow Cytometry Data in R?
The standard tools (FlowJo, FCS Express) are powerful but expensive — licenses can cost thousands of dollars per year. R offers a fully functional alternative that is free, open-source, and crucially, reproducible. Every gate, every transformation, every statistical test is written in code you can share, version-control, and re-run on new data.
For research labs with multiple users, R also scales much better: a single script can process hundreds of FCS files overnight with no manual intervention. That's not something you can do clicking through a GUI.
Step 1 — Install the Essential Packages
All packages you need are available through Bioconductor, which is the standard repository for bioinformatics R packages.
# Install Bioconductor manager once if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") # Core flow cytometry packages BiocManager::install(c( "flowCore", # read/write FCS files, core data structures "flowViz", # base plotting for flow data "ggcyto", # ggplot2-based visualization "openCyto", # automated gating pipelines "flowWorkspace", # gating set management "CytoExploreR" # interactive gating interface ))
Step 2 — Load Your FCS Files
The flowCore package handles FCS 2.0, 3.0, and 3.1 files. The basic unit is a flowFrame (one FCS file) or a flowSet (multiple files).
library(flowCore) # Load a single FCS file ff <- read.FCS("sample_001.fcs", transformation = FALSE) # See what's inside summary(ff) colnames(ff) # channel names (e.g., "FITC-A", "PE-A") pData(parameters(ff)) # full channel descriptions # Load all FCS files from a folder into a flowSet fs <- read.flowSet( path = "./fcs_files/", pattern = "*.fcs", transformation = FALSE ) length(fs) # number of samples sampleNames(fs) # file names
Important: always load with transformation = FALSE first. Apply transformations yourself in a controlled way — letting flowCore auto-transform can give unexpected results with some instruments.
Step 3 — Quality Control
Before any analysis, remove debris, dead cells, and time-based anomalies (instrument clogs, pressure drops). The flowAI or PeacoQC packages automate this, but a manual check is always useful first.
library(ggcyto) # Quick look at FSC vs SSC to assess overall quality autoplot(fs[[1]], x = "FSC-A", y = "SSC-A", bins = 64) # Time-based QC: check for signal drift autoplot(fs[[1]], x = "Time", y = "FSC-A") # Automated QC with flowAI BiocManager::install("flowAI") library(flowAI) fs_clean <- flow_auto_qc(fs, output = 1)
Step 4 — Compensation
Spectral spillover is unavoidable in multicolor flow cytometry. If your FCS files already contain a compensation matrix embedded by the cytometer software, you can apply it directly. Otherwise, you calculate it from single-stain controls.
# Extract the embedded compensation matrix comp_matrix <- keyword(fs[[1]])"$SPILL"] comp <- compensation(comp_matrix) # Apply to the whole flowSet fs_comp <- compensate(fs_clean, comp) # If you have single-stain controls, compute from scratch # spillover() estimates the matrix from single-color tubes comp_manual <- spillover( x = single_stains_flowset, unstained = unstained_sample, patt = "*.fcs", fsc = "FSC-A", ssc = "SSC-A", plot = TRUE )
Step 5 — Transformation
Flow cytometry data is never analyzed on a linear scale. Fluorescence intensities span several decades and include negative values after compensation. The two standard transformations are logicle (biexponential) and arcsinh.
# Logicle transform — the standard for flow cytometry # estimateLogicle() picks optimal parameters per channel trans <- estimateLogicle(fs_comp[[1]], channels = colnames(fs_comp)) fs_trans <- transform(fs_comp, trans) # Arcsinh — preferred for mass cytometry (CyTOF), cofactor = 5 asinhTrans <- arcsinhTransform(transformationId = "arcsinh", a = 0, b = 1/5, c = 0) channels <- colnames(fs_comp)[4:30] # adjust to your channel range translist <- transformList(channels, asinhTrans) fs_trans <- transform(fs_comp, translist)
Step 6 — Gating
Gating is where most of the analytical decisions happen. In R you can gate manually (by defining polygon or rectangle gates), use statistical methods, or use fully automated pipelines via openCyto.
Manual gating with flowWorkspace
library(flowWorkspace) library(ggcyto) # Create a GatingSet from your flowSet gs <- GatingSet(fs_trans) # Define a lymphocyte gate (FSC/SSC) lymph_gate <- polygonGate( filterId = "Lymphocytes", .gate = matrix(c( 50000, 20000, # FSC-A, SSC-A corner 1 200000, 20000, # corner 2 200000, 80000, # corner 3 50000, 80000 # corner 4 ), ncol = 2, dimnames = list(NULL, c("FSC-A", "SSC-A"))) ) # Add gate to the gating hierarchy gs_pop_add(gs, lymph_gate, parent = "root") recompute(gs) # Gate on CD3+CD4+ T cells from lymphocyte parent cd4_gate <- rectangleGate( filterId = "CD4 T cells", "BV421-A" = c(2, Inf), # CD3 "PE-A" = c(2, Inf) # CD4 ) gs_pop_add(gs, cd4_gate, parent = "Lymphocytes") recompute(gs)
Automated gating with openCyto
library(openCyto) # openCyto uses a gating template (CSV) to define the hierarchy # then applies data-driven algorithms to each gate gt <- gatingTemplate("gating_template.csv") gt_gating(gt, gs) # The template CSV looks like: # alias, pop, parent, dims, gating_method, gating_args # Lymphocytes, +, root, FSC-A:SSC-A, flowClust.2d, K=1 # Live, +, Lymphocytes, Viability-A, mindensity, # CD4 T cells, +, Live, CD3-A:CD4-A, cytokine,
Step 7 — Extract Cell Population Statistics
Once your gating hierarchy is built, extracting counts and percentages is a single function call.
# Get population statistics for all samples stats <- gs_pop_get_stats(gs, type = "percent") head(stats) # Get absolute counts counts <- gs_pop_get_stats(gs, type = "count") # Reshape to wide format for downstream stats library(tidyr) library(dplyr) stats_wide <- stats |> pivot_wider(names_from = pop, values_from = percent) |> left_join(sample_metadata, by = "name") # Now run your statistics wilcox.test(stats_wide$`CD4 T cells` ~ stats_wide$group)
Step 8 — Visualization
The ggcyto package extends ggplot2 to understand flow cytometry data structures. You get the full power of the ggplot2 grammar — facets, themes, scales — applied directly to your gating sets.
library(ggcyto) # Overlay gate on a dot plot autoplot(gs, gate = "Lymphocytes", x = "FSC-A", y = "SSC-A", bins = 64, strip.text = "gate") # Multi-panel: one panel per sample, CD3 vs CD4 ggcyto(gs, aes(x = "CD3-A", y = "CD4-A"), subset = "Lymphocytes") + geom_hex(bins = 64) + geom_gate("CD4 T cells") + geom_stats() + scale_x_logicle() + scale_y_logicle() + facet_wrap(~name) + theme_bw() # Box plot comparing populations across groups library(ggplot2) ggplot(stats_wide, aes(x = group, y = `CD4 T cells`, fill = group)) + geom_boxplot(alpha = 0.7) + geom_jitter(width = 0.1) + labs(y = "CD4+ T cells (%)", x = NULL) + theme_bw()
Going Further: CytoFAST for High-Throughput Visualization
When you work with FlowSOM clustering results and need to visualize many clusters across many samples quickly, CytoFAST was built exactly for that. It reads FlowSOM output and produces heatmaps and summary plots in seconds — something that becomes critical when you're analyzing 50+ FCS files simultaneously.
# CytoFAST — rapid visualization of FlowSOM cluster results install.packages("cytofast") library(cytofast) # Read FlowSOM cluster labels cfList <- readCytof( ffList = ff_list, clust = "SOM_label", samples = "sampleID", ... ) # Summary heatmap: clusters × markers cytoHeat(cfList, key = "clust", legend = TRUE) # Stacked bar chart: cluster frequencies per sample cytoBar(cfList)
Recommended Workflow Summary
| Step | Package | Key function |
|---|---|---|
| Load FCS files | flowCore | read.flowSet() |
| Quality control | flowAI / PeacoQC | flow_auto_qc() |
| Compensation | flowCore | compensate() |
| Transformation | flowCore | estimateLogicle() |
| Gating | flowWorkspace / openCyto | gs_pop_add(), gt_gating() |
| Statistics | flowWorkspace | gs_pop_get_stats() |
| Visualization | ggcyto | ggcyto(), autoplot() |
| Clustering | FlowSOM / cytofast | FlowSOM(), cytoHeat() |
Common Pitfalls and How to Avoid Them
- Applying transformation before compensation — always compensate first, then transform. Transforming first distorts the spillover calculation.
- Using the same logicle parameters for all samples —
estimateLogicle()per sample gives better results than a fixed parameter set, especially if acquisition conditions varied. - Forgetting to recompute after adding gates — always call
recompute(gs)after modifying the gating hierarchy, or your statistics will be stale. - Not setting a random seed before clustering — FlowSOM uses random initialization. Set
set.seed(42)before running to ensure reproducibility. - Excluding negative events entirely — negative values after compensation are real data, not artifacts. The logicle transform handles them correctly; don't gate them out.
Resources and Further Reading
- flowCore documentation on Bioconductor
- Hahne et al. (2009) — "flowCore: A Bioconductor package for high throughput flow cytometry" — BMC Bioinformatics
- Van Gassen et al. (2015) — "FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data" — Cytometry Part A
- Nowicka et al. (2019) — "CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets" — F1000Research
- Clustering algorithms for mass cytometry — FlowSOM, PhenoGraph, UMAP comparison
- Using Claude Code in RStudio — speed up your R analysis with AI assistance