| TL;DR | An influential framework (Nelson 2020) combining unsupervised topic modeling with qualitative reading — but one that rests on three flawed assumptions, per Carlsen & Ralund's critique. Their replacement, CALM, keeps humans as the interpretive ground truth. |
|---|
The original framework (Nelson 2020)
Nelson coined “computational grounded theory” (CGT) to align unsupervised machine learning with grounded theory’s inductive, data-driven spirit. Three steps:
- Pattern discovery — unsupervised models (typically LDA topic models) find clusters in the corpus without researcher-imposed categories
- Pattern refinement — researcher reads paradigmatic documents per topic and interprets what the model found
- Pattern confirmation — indirect validation by correlating topic measures with external variables
The appeal: scalable, seemingly inductive, requires minimal manual reading. Popular in computational social science.
Why it doesn’t work (Carlsen & Ralund 2022)
(carlsen-ralund-computational-grounded-theory-2022) systematically critiques all three steps:
Discovery fails: LDA enforces uniform topic sizes and fixed topic counts — real corpora don’t look like this. The result is fused, duplicated, and unstable topics. Word co-occurrence doesn’t reliably map to semantic meaning; you can’t trust unsupervised models to find the right categories.
Minimal immersion fails: Reading only model-selected “paradigmatic” documents cannot qualify a researcher to interpret meaning within a community. Paradigmatic cases are only meaningful relative to extensive prior reading. CGT shortcuts the very immersion that makes grounded theory valid.
Indirect validation fails: Correlating a topic measure with another variable cannot detect systematic measurement error. If the model consistently miscodes documents, external correlations will still appear. Only direct validation — human coding of a random sample — catches this.
CALM: the replacement framework
Carlsen & Ralund’s Computer Assisted Learning and Measurement (CALM):
| Stage | What happens | Who leads |
|---|---|---|
| Discovery | HSBM/word clustering generates candidate categories and search terms | Computer proposes; human decides |
| Interpretation | Search terms retrieve documents; researcher reads extensively, writes memos, builds coding scheme | Human |
| Classification | Coding scheme applied; supervised ML scales it | Human codes examples; computer scales |
| Validation | Direct validation against human-coded test set | Human |
| Measurement | Validated classifier applied to full corpus | Computer |
The key move: the computer solves the rarity and scale problem (finding enough relevant cases for saturation, then scaling classification), but the human remains the interpretive ground truth throughout.
Relationship to LLM-assisted research
CALM was designed before the LLM era, but its logic applies directly. Modern LLM-based approaches (like bijker-chatgpt-qca-2024) that have LLMs generate coding schemes and code data wholesale are essentially the computer-led paradigm CGT represents — they inherit the same epistemological risks. The CALM corrective: use LLMs to surface candidate themes and assist discovery, while insisting on human grounding and direct validation.
See also
- carlsen-ralund-computational-grounded-theory-2022 — the source paper
- nelson-computational-grounded-theory-2020 — the original CGT framework being critiqued
- llm-qualitative-research — the broader context
- epistemic-flattening — the risk of letting computers lead interpretation
- intercoder-agreement — why direct validation matters more than indirect
- brailas-ai-qualitative-research-2025 — parallel epistemological argument
- human-ai-collaboration — CALM as the definitive framework for correct division of labor
- validity-trustworthiness — direct vs. indirect validation in detail
- contested-claims — whether AI can genuinely “discover” new patterns (Claim 6)
What links here
- Carlsen & Ralund (2022) — Computational Grounded Theory Revisited: From Computer-Led to Computer-Assisted Text Analysis
- Costa et al. (2025) — AI as a Co-researcher in the Qualitative Research Workflow: Transforming Human-AI Collaboration
- Epistemic Flattening
- Epistemology — Stances Across the Literature
- Fischer & Biemann (2024) — Exploring Large Language Models for Qualitative Data Analysis
- Friese (2026) — From Coding to Conversation: A New Methodological Framework for AI-Assisted Qualitative Analysis
- AI in Qualitative Research
- Human-AI Collaboration — Frameworks and Models
- Index
- LLMs for Qualitative Research
- Nelson (2020) — Computational Grounded Theory: A Methodological Framework
- Nguyen-Trung (2025) — ChatGPT in Thematic Analysis: GAITA and the ACTOR Framework
- Qualitative AI Methods — A Living Taxonomy
- Sinha et al. (2024) — The Role of Generative AI in Qualitative Research: GPT-4's Contributions to a Grounded Theory Analysis
- Übellacker (2024) — AcademiaOS: Automating Grounded Theory Development with Large Language Models
- Validity and Trustworthiness