Qualitative AI Methods — A Living Taxonomy

Source
TL;DR	Approaches to using AI in qualitative research range from conservative human-led assistance to fully automated pipelines. The field has moved rapidly from "does it work?" (2022–2024) toward "how should it work?" (2024–2026), with new frameworks emerging that take epistemological commitments seriously rather than treating AI as a drop-in coder. A growing counter-response to the Jowsey et al. categorical rejection — including a 100+ signatory response letter ([[friese-et-al-beyond-binary-2026]]) — is consolidating a third position: neither uncritical adoption nor blanket prohibition, but critically reflexive integration grounded in pluralist epistemological frameworks.

Why a taxonomy?

The phrase “AI-assisted qualitative research” covers a wide range of practices — from a researcher using ChatGPT to brainstorm codes to a fully automated pipeline that produces grounded theory without any human reading the data. These are not variations on a theme; they are epistemologically distinct approaches with different assumptions, different risks, and different validity profiles.

nicmanis-spurrier-ai-guide-2025 identifies the organizing principle: the appropriate AI approach depends on the research’s epistemological commitments. Small-q approaches (oriented toward reliability, replication, and systematic measurement) are compatible with more automation and AI-led pattern discovery. Big-Q approaches (oriented toward interpretation, meaning, and researcher reflexivity) require AI to be positioned as assistant, not analyst. Mapping the taxonomy without this distinction produces confused guidance.

Taxonomy by AI role

1. AI as coder — reliability-focused approaches

The earliest and most empirically studied role: AI applies a coding scheme to text, and reliability is evaluated by comparing AI coding against human coding.

Representative sources: bijker-chatgpt-qca-2024, bennis-ai-thematic-analysis-2025, prescott-ai-thematic-analysis-2024, salazar-gpt4-qualitative-2025, hamilton-ai-qualitative-2023, sakaguchi-chatgpt-japanese-2025

Typical design: Human develops coding scheme → AI codes data → Cohen’s κ or Jaccard index measures agreement → researcher adjudicates discrepancies.

Key finding: Works better for inductive coding (κ 0.72–0.82, Bijker) than deductive (κ 0.58–0.73). Near-perfect concordance achievable under favorable conditions (Jaccard = 1.00, Bennis). Breaks down for low-frequency codes (Salazar, Prescott), culturally embedded themes (Sakaguchi), and unguided AI operating on full documents (Jowsey). ayik-et-al-2026-human-vs-ai-ta-tools extends the comparison to four structurally distinct AI tools (ChatGPT-4o, QInsights, ATLAS.ti AI, MAXQDA AI Assist) vs. validated human TA: MAXQDA achieved 50% exact theme match; zero hallucinations across all tools with explicit prompt constraints; tool design was found to encode epistemological orientation (frequency-based vs. dialogic).

Central limitation: Reliability ≠ validity. High agreement between AI and human coders on a flawed scheme is still wrong. The validity gap — whether coded categories capture what they purport to capture — is unmeasured in most studies. See intercoder-agreement and validity-trustworthiness.

2. AI as pattern discoverer — computational approaches

AI identifies patterns, themes, or categories from text without a predefined scheme. The researcher then interprets what the computer found.

Representative sources: nelson-computational-grounded-theory-2020, carlsen-ralund-computational-grounded-theory-2022, ubellacker-academiaos-2024, sinha-gpt4-grounded-theory-2024

The original version: CGT (Nelson 2020) used LDA topic models for pattern discovery, then had researchers read “paradigmatic” documents to interpret model output. Influential, widely adopted.

The problem: carlsen-ralund-computational-grounded-theory-2022 demonstrates through simulation that LDA-led discovery is structurally unreliable — fused topics, duplicates, instability. Reading paradigmatic documents doesn’t constitute the immersion required for genuine interpretive competence. See computational-grounded-theory.

The LLM update: ubellacker-academiaos-2024's AcademiaOS automates the full Gioia GT pipeline using LLMs. sinha-gpt4-grounded-theory-2024 uses GPT-4 for open coding. Both inherit the fundamental question: can AI-led discovery produce qualitatively valid categories, or does it only produce statistically probable ones? epistemic-flattening names this risk precisely.

CALM alternative: Carlsen & Ralund replace AI-led discovery with human-led interpretation using AI for scale: HSBM generates candidate search terms → human reads extensively → human develops coding scheme → ML scales classification → human validates. Human is interpretive ground truth throughout.

2.5. AI-in-the-loop analysis

Source: wise-et-al-2026-ai-not-the-enemy
Position: AI incorporated into human-led processes not for efficiency but to deepen core interpretivist commitments that are structurally difficult to sustain alone.

The most technically grounded framework in the corpus. Wise et al. argue that the problem with most AI-assisted qualitative research is that it uses AI to speed up or automate tasks — ceding interpretive ground in the process. AI-in-the-loop inverts this: LLM architectural properties are enlisted specifically to help researchers enact their commitments (close attention, data immersion, contextual grounding, positionality, multiple perspectives). See human-ai-collaboration for the full table mapping LLM properties to qualitative commitments.

What distinguishes it: Requires full corpus in active model context (not RAG); uses persona prompting and systematic disconfirmation prompting; treats positionality as a resource (varied temperature/personas) rather than noise. Connects explicitly to Guba & Lincoln trustworthiness criteria. Proposes transparency in analytic decision-making as a new criterion. See validity-trustworthiness and prompt-engineering.

3. AI as dialogic partner — framework-based approaches

Rather than coding or discovering, AI engages in structured dialogue with the researcher, helping develop interpretations through iterative questioning and challenge.

Representative sources: friese-caai-framework-2026, costa-abductivai-2025, brailas-ai-qualitative-research-2025, dahal-genai-qualitative-nepal-2024

CAAI (Friese 2026): Replaces coding entirely. Five-step dialogic process: define research questions → delineate core concepts → systematic dialogic interaction (conceptually driven prompts, not classification) → verification → interpretation. AI does not classify; it helps the researcher think. Rooted in hermeneutic epistemology.

NITA — Narrative-Integrated Thematic Analysis (Nguyen-Trung & Nguyen 2026): The other fully non-coding approach. Six-stage framework that replaces coding with narrative construction: Planning PERFECT → Preparation → Generating candidate themes → Constructing individual narratives → Constructing meta-narratives → Writing up. The PERFECT monitoring framework (seven components: Purpose, Envision, Realize, Formulate, Experiment, Check & Reflect, Tune) structures the researcher’s reflexive audit trail. First three components (PER) are researcher-only; next two (FE) are human-AI collaborative; final two (CT) return to sole researcher evaluation. Proposes “dialogic thinking mode” as a new addition to Freeman’s (2016) taxonomy of qualitative thinking modes. See nguyen-trung-nita-2026. Built on Matrix Analysis, Framework Analysis, and Interpretive Thematic Integration (ITI). Pragmatist/nonpositivist — explicitly not reflexive TA (Braun & Clarke have rejected GenAI). For experienced researchers only.

AbductivAI (Costa 2025): Actor-Network Theory + distributed cognition + Chain-of-Prompting. AI is co-researcher in a distributed analytic system. Abductive reasoning: move from data to unexpected hypotheses, not from hypotheses to confirmatory coding. Tested on 323 conference abstracts.

The abductive argument (Brailas): Use AI to surface what is improbable, contradictory, or unexpected — precisely opposite of reliability-focused approaches that value consistency with the statistically probable. See epistemic-flattening for why this matters.

4. AI as research assistant — practical support roles

AI performs bounded tasks within a researcher-controlled workflow: literature review, memo drafting, familiarization summaries, language refinement, theme organization.

Representative sources: dahal-genai-qualitative-nepal-2024, xu-ai-thematic-analysis-2026, christou-ta-through-ai-2024, christou-ai-resource-2023, yang-gpt4-qualitative-guide-2025, goyanes-chatgpt-protocol-2025, naeem-chatgpt-ta-steps-2025

Key position: AI handles mechanical tasks (transcript summarization, literature searching, initial theme organization) while the researcher retains interpretive authority. The most conservative epistemological position consistent with full AI integration.

The practical guides in the corpus (naeem-chatgpt-ta-steps-2025, goyanes-chatgpt-protocol-2025, yang-gpt4-qualitative-guide-2025) mostly occupy this space — they show how to use AI at each phase of existing methods (Braun & Clarke’s 6 phases, ACTOR framework) while insisting human judgment governs all interpretive decisions.

dellafiore-et-al-2025-expert-interviews provides rare empirical evidence of how expert researchers actually use AI: almost universally for transcription, translation, and scientific English writing; rarely for coding or interpretation; often with a sense of shame about the use they do engage in. The technical/interpretive task split that frameworks prescribe is reproduced in expert practice — but not necessarily for methodological reasons.

5. Fully automated pipelines

AI conducts the entire analytic workflow with minimal human input. Currently the frontier of the field.

Representative sources: ubellacker-academiaos-2024

AcademiaOS: Open-source system that automates Gioia’s grounded theory pipeline. User study (n=19) found researchers valued speed and structure; validity of resulting theory remains contested. The most ambitious automation in the corpus.

The unresolved question: Can a fully automated pipeline produce valid qualitative findings? The consensus in the critical literature is no — not because AI can’t identify patterns, but because qualitative validity requires human immersion, reflexivity, and positionality that cannot be automated. jowsey-frankenstein-ai-ta-2025 provides the most alarming empirical evidence: Copilot operating without human input fabricated 58% of quotes and read only the first 2–3 pages.

Taxonomy by epistemological fit

Approach	Fits small-q?	Fits Big-Q?	Notes
AI as coder (reliability)	Yes	With caveats	Requires human scheme development; reliability ≠ validity
AI as pattern discoverer	Partially	No	Discovery validity unresolved; CALM corrects
AI as dialogic partner	No	Yes	Designed for interpretive, meaning-focused research
AI as research assistant	Yes	Yes	Most broadly compatible; interpretive authority stays human
Fully automated	Yes (small-q only)	No	Validity for Big-Q research unresolved

The spectrum of automation

Running across all five roles is a spectrum from researcher-led to AI-led:

Researcher-led                              AI-led
     │                                         │
     ▼                                         ▼
  AI as           AI as          AI as      Fully
assistant  →  dialogic  →  discoverer  → automated
              partner                     pipeline
              
  (CAAI)    (CALM)       (CGT)        (AcademiaOS)

The methodological consensus in the critical literature: interpretive validity requires the researcher to remain on the left side of this spectrum for meaning-making tasks, regardless of how much AI assistance they use for mechanical tasks.

Current controversies

Whether any AI approach is compatible with Big-Q interpretive research — or whether the researcher’s “co-analysis” with AI fundamentally changes what the research is. See contested-claims.
Whether QDAS platforms’ marketing of AI as pattern discoverer (“AI generates insights”) misrepresents what both AI and qualitative research do. (paulus-marone-qdas-discourse-2024)
Whether the efficiency gains from AI justify the methodological risks, or whether “28× faster” (prescott-ai-thematic-analysis-2024) is a marketing figure that ignores validation burden.
The categorical rejection debate: jowsey-et-al-2025-we-reject (419 signatories including Braun and Clarke) argues that all approaches above are incompatible with Big-Q reflexive research in principle — AI cannot make meaning; reflexive qualitative research must remain distinctly human. de-paoli-reject-rejection-2026 counters that this conflates philosophy of mind with methodology; greenhalgh-2026-beyond-the-binary argues the binary framing itself is the problem. friese-et-al-beyond-binary-2026 (100+ signatories, four co-authors including Friese and David Morgan) is the most theoretically grounded counter-response to date: assembles assemblage theory (Deleuze & Guattari), distributed cognition (Hutchins), posthumanism (Barad), and sociomateriality (Orlikowski) to contest the “exclusively human” meaning-making premise. Also notes an internal tension: Braun & Clarke’s signature on a categorical rejection letter sits uneasily with their consistent description of reflexive TA as flexible. Whether any framework that includes AI can satisfy Big-Q epistemological commitments is now the field’s sharpest open question. See contested-claims Claim 9.
Cross-disciplinary adoption pressures: Beyond the qualitative methods debate, the broader scholarly world is having a different conversation — one framed by competition and market logic rather than epistemological validity. andrews-progress-or-perish-2026 (IB/management) argues that restrictive AI policies create “false dichotomies” and produce “AI shaming” cultures where researchers conceal use rather than disclose it. This dynamic has been documented empirically in qualitative contexts too (dellafiore-et-al-2025-expert-interviews). The adoption debate is no longer local to qualitative methods.