Source
urlhttps://doi.org/10.1177/20539517221080146
rawraw/carlsen-ralund-2022-computational-grounded-theory-revisited-from-computer-led-to-computer-assisted-text-analysis.pdf

TL;DR: The most technically grounded methodological critique in the corpus. Carlsen & Ralund demonstrate — through simulation and principled argument — that the dominant computer-led approach to text analysis (Nelson’s computational grounded theory) fails at each of its three core stages. Their replacement, CALM (Computer Assisted Learning and Measurement), keeps humans as the interpretive ground truth and uses computers to solve the scale problem.

Problem

Large-scale text analysis in the social sciences faces a genuine dilemma. Manual qualitative analysis is rigorous but cannot handle corpora of millions of documents. Purely quantitative approaches (word counts, co-occurrence statistics) handle scale but lose interpretive depth. Nelson’s (2020) computational grounded theory (CGT) promised a resolution: let unsupervised machine learning discover patterns, then let qualitative reading interpret them.

Carlsen & Ralund’s problem: CGT’s resolution is illusory. Each stage of the framework rests on an assumption that doesn’t hold in practice. The paper’s contribution is to demonstrate this rigorously — not through abstract critique but through simulation, empirical argument, and principled methodological reformulation.

Approach

The paper’s first move is a simulation study: using artificially constructed text corpora with known planted topics, the authors test whether LDA topic models can recover those topics. They cannot — reliably. LDA enforces a fixed number of topics and assumes uniform topic size distribution across the corpus. Real text corpora have wildly unequal topic sizes (some topics appear frequently, others rarely). The result: fused topics (two genuine topics merged into one model topic), duplicate topics, and instability across model runs. This is not a problem with any specific implementation; it is structural to LDA.

The second move is principled argument: even if a model found the right clusters, minimal reading of model-selected “paradigmatic” documents cannot qualify the researcher to interpret meaning within a community. Paradigmatic cases are only meaningful relative to having read many other cases — the very immersion that CGT’s efficiency promise circumvents.

The third move is methodological demonstration: indirect validation (correlating topic measures with external variables) cannot detect systematic measurement error. If a model consistently miscodes documents from a particular community, predictive correlations with external variables can still appear — the mismatch is invisible to the validation strategy CGT recommends.

CALM — Computer Assisted Learning and Measurement — is the response framework. Five stages: Discovery, Interpretation, Classification, Validation, Measurement. The key division of labour: computational models surface candidate categories and find enough cases for saturation; human readers interpret, develop coding schemes, and serve as ground truth; supervised machine learning scales the validated classification to the full corpus.

AI’s role

AI (specifically unsupervised and supervised machine learning, not LLMs as such) is positioned as a scale instrument: it finds cases the human reader could not find through random sampling, and it scales validated classifications to large corpora. But it is emphatically not an interpretive agent. The justification for any category comes from the qualified human reader, not from computational patterns.

This is a more conservative position than most LLM-era papers. CALM was developed before ChatGPT, but its logic translates directly: use AI to retrieve, retrieve, retrieve — and to scale once you have validated classifications — but never let AI decide what a document means.

Epistemological stance

Interpretivist with a post-positivist measurement ambition. Carlsen & Ralund take seriously both the interpretivist argument (meaning is contextual, community-specific, and requires qualified human understanding) and the measurement aspiration (we want to produce valid, reliable classifications that can be used in quantitative analysis). The tension between these is productive rather than paralyzing: CALM is designed to honor both.

The paper draws on linguistic philosophy (Malinowski’s “context of situation”) to ground its rejection of the assumption that word co-occurrence captures meaning. Words are effective indexes of meaning only within specific contexts — and what elements of context are relevant must be learned through immersion, not inferred from statistical patterns.

Rigor and trustworthiness

The simulation study is methodologically elegant: by constructing a corpus with known planted topics, the authors create a ground truth against which LDA performance can be directly evaluated. The results are unambiguous within this controlled setting.

The principled arguments (minimal immersion, indirect validation) are more discursive but closely reasoned. The CALM framework has been applied in the authors’ own work on Facebook political discourse — it is not purely theoretical.

Limitations

CALM is demanding. The interpretation and grounding stage requires extensive deep reading and memo-writing — essentially the full qualitative immersion that CGT promised to shortcut. For researchers who genuinely cannot do that at scale, CALM may not offer a practical alternative to more automated approaches.

The paper pre-dates LLMs. The CALM framework’s logic extends to LLM-assisted work, but the paper does not address how LLMs change (or don’t change) the fundamental problems it identifies. ubellacker-academiaos-2024 represents one answer to what “computer-led GT with LLMs” looks like; CALM’s critique applies to it directly.

Connections