Human-AI Collaboration — Frameworks and Models

Source
TL;DR	The defining methodological question of the field: who holds interpretive authority, and how should the analytic labor be divided between human and AI? The frameworks that have emerged since 2022 converge on a principle the empirical studies rarely articulate explicitly — the human must remain interpretive ground truth. How they operationalize this principle varies enormously.

The core question

“AI-assisted qualitative research” encompasses everything from a researcher using ChatGPT to brainstorm codes (AI as notebook) to a fully automated pipeline that generates grounded theory without human reading (AI as analyst). These are not variations on one practice; they are different practices with different epistemological implications.

The central question across the corpus is: who decides what the data means? The answer determines the validity of the research, the ethics of the method, and the epistemological tradition it belongs to. See epistemology for the full mapping.

The most important structural insight comes from carlsen-ralund-computational-grounded-theory-2022: the division of labor between human and computer should be organized around their respective competences. Computers handle scale — finding enough relevant cases, scaling a validated classifier to a large corpus. Humans provide interpretive ground truth — developing coding schemes, reading extensively, validating categories against direct human coding of a random sample. Any framework that puts the computer in the interpretive role has the division of labor backwards.

The main frameworks

CALM — Computer Assisted Learning and Measurement

Source: carlsen-ralund-computational-grounded-theory-2022
Position: Human is ground truth; computer handles scale and rarity.

The most carefully theorized framework in the corpus. CALM organizes analytic labor across five stages:

Stage	Task	AI role	Human role
Discovery	Find candidate categories and search terms	HSBM generates candidates	Decides which to pursue
Interpretation	Read documents, build coding scheme	Retrieves relevant documents	Reads extensively, writes memos, develops scheme
Classification	Apply coding scheme	ML scales classification	Provides training examples
Validation	Verify classification quality	Applies to test set	Codes random sample; adjudicates
Measurement	Apply to full corpus	Runs classifier	Interprets results

The key move: AI solves the rarity problem (finding enough cases of rare phenomena to understand them) and the scale problem (applying a validated scheme to thousands of documents). The human solves the interpretation problem (what do these patterns mean within this specific social context?).

What CALM rejects: CGT’s pattern discovery approach, where computer-led topic modeling is supposed to find the right categories. See computational-grounded-theory.

CAAI — Conversational Analysis to the Power of AI

Source: friese-caai-framework-2026
Position: Replace coding with structured dialogic interaction; AI as hermeneutic partner.

CAAI makes the most radical departure from conventional AI coding studies. The framework:

Define research questions precisely
Delineate core concepts and their expected textual markers
Conduct systematic conceptually-driven dialogic interaction with AI
Verify and cross-check interpretations with additional data queries
Integrate interpretations into research output

AI never classifies segments. It engages in structured dialogue, guided by theoretically informed researcher prompts, that helps develop understanding rather than produce counts. This is analogous to how a researcher uses literature: not to confirm a coding scheme but to think through what the data means.

The epistemological position: Hermeneutic. Understanding is iterative and dialogic. AI participates in the hermeneutic circle — offering interpretations that the researcher evaluates, challenges, refines. The researcher’s theoretical knowledge and cultural competence govern the interaction throughout.

What CAAI rejects: The coding paradigm entirely. Friese argues that treating qualitative analysis as a classification task misrepresents what qualitative understanding is. The discourse analysis pilot demonstrated that CAAI surfaces richer interpretive terrain than coding permits.

AbductivAI

Source: costa-abductivai-2025
Position: AI as co-researcher in a distributed analytic system; abductive reasoning as the organizing logic.

Built on Actor-Network Theory and distributed cognition, AbductivAI positions AI as one node in a network that includes the researcher, the data, prior theory, and methodological conventions. The AI is not a tool wielded by the researcher; it is an actor with its own “inscribed” norms that shape the analytic process.

Chain-of-Prompting: Structured prompting sequences that move from data-driven observation to theoretical hypothesis to confirmation of the unexpected. The chain replicates abductive reasoning: notice something anomalous → generate an explanation → check it against more data.

What makes this distinctive: The focus on the unexpected rather than the consistent. AI is designed to surface what defies expectation, not to confirm what is statistically probable. This directly addresses epistemic-flattening.

GAITA — Guided AI Thematic Analysis

Source: nguyen-trung-gaita-2025
Position: Researcher as reflexive leader throughout; AI as structural support for Template Analysis adaptation.

Four stages adapted from Template Analysis (King, 2012):

Familiarization — researcher reads data; AI assists with organization and initial summaries
Initial coding — researcher develops provisional template; AI applies to data segments
Template refinement — researcher refines codes iteratively based on AI-assisted pattern review
Theme finalization — researcher finalizes themes; AI supports cross-data verification

ACTOR prompting framework: Anchoring (research question context), Chaining (sequential prompting), Tasking (specific tasks per stage), Organizing (structure requirements), Reflecting (prompting AI to explain reasoning).

The position on interpretive authority: The researcher guides every stage. AI is explicitly positioned as implementing the researcher’s analytic vision, not developing its own. Reflexive memos at each stage document how AI shaped the process.

AI-in-the-Loop Analysis

Source: wise-et-al-2026-ai-not-the-enemy
Position: AI incorporated into human-led analytic processes to deepen core qualitative commitments — not for efficiency but for interpretive depth.

The most technically detailed framework in the corpus. Wise et al. explicitly reframe the debate: the problem with most AI-assisted qualitative research is that it uses AI for efficiency and automation, ceding interpretive judgment to the machine. AI-in-the-loop inverts this — computational capabilities are enlisted specifically to help researchers enact the commitments that interpretive qualitative analysis requires but structurally struggles to achieve.

Five core qualitative commitments mapped to LLM properties:

Commitment	LLM property	Practical application
Close attention + iterative layering	Attention mechanisms; auto-regression	Multiple passes, deepening embeddings
Immersion in data with full context	Long-context (128K–1M tokens)	Entire corpus held in model context simultaneously
Context surrounding data interpreted	Large-scale pre-training	Culturally grounded interpretation of implicit meaning
Positionality as analytic resource	Auto-regression + temperature	Varied responses; persona prompting for multiple stances
Multiple perspectives in dialogue	All of above	Iterative exploration of interpretive alternatives

Trustworthiness connections: Credibility (full-corpus contextualization), dependability (temporal audit of theme stability), confirmability (systematic surfacing of disconfirming instances), transferability (detailed prompt documentation), authenticity (targeted search for underrepresented voices). A new criterion is proposed: transparency in analytic decision-making — full documentation of model, parameters, and iterative prompting choices.

Technical caveats: RAG (Retrieval-Augmented Generation) is explicitly discouraged; the full corpus must be in the model’s active context, not selectively retrieved. WEIRD bias in training data must be interrogated with the same reflexivity researchers apply to their own positionality.

CGT (the cautionary case) — Computational Grounded Theory

Source: nelson-computational-grounded-theory-2020 (original); carlsen-ralund-computational-grounded-theory-2022 (critique)

CGT placed AI in the discovery role: unsupervised topic models find patterns; researchers interpret what the model found. This reversal of the human-AI relationship — computer leads discovery, human interprets computer output — is the paradigm case of what the critical literature argues against.

The structural problem: reading model-selected “paradigmatic” documents does not qualify a researcher to interpret meaning across a community. Paradigmatic cases are only intelligible in relation to the full corpus. The researcher who has not read the data extensively is not positioned to judge whether the model’s topics are real, meaningful, or artifacts of LDA’s mathematical assumptions. See computational-grounded-theory.

NITA — Narrative-Integrated Thematic Analysis

Source: nguyen-trung-nita-2026
Position: Non-coding, narrative-centered approach; PERFECT monitoring framework as the structured reflexive audit trail.

NITA is the second fully non-coding AI-assisted TA framework in the corpus (alongside CAAI). The division of labor is organized not just by task type (mechanical vs. interpretive) but by a temporal architecture — the PERFECT monitoring procedure governs when human and AI engage:

PER (Purpose, Envision, Realize): Researcher-only space. Identity, positionality, and analytic vision established before any AI contact with data.
FE (Formulate, Experiment): Human-AI collaborative space. Analyst approach developed with LLM assistance; prompts iterated on sample data.
CT (Check & Reflect, Tune): Researcher returns as sole evaluator. AI outputs interrogated, distortions identified, approach adjusted.

This temporal structure is a contribution: rather than specifying which tasks AI can perform, PERFECT specifies at what stages AI enters the process. The researcher’s interpretive authority is established and documented before AI contact begins. From this secure foundation, the researcher can genuinely evaluate rather than be anchored by AI output.

The dialogic thinking mode: NITA proposes “dialogic thinking” as a new mode in Freeman’s (2016) taxonomy of qualitative thinking modes, alongside categorizing, narrative, visual, and others. In dialogic thinking mode, AI outputs are provocations that the researcher responds to — not evidence to accept or coding to verify. The epistemic unit is the exchange, not the AI output.

What distinguishes NITA from CAAI: CAAI replaces coding with dialogic interaction while still involving the researcher in iterative questioning at every stage. NITA bypasses the interactive stage at theme generation (AI generates candidate themes from the corpus) and shifts researcher involvement to the narrative construction stages that follow. The researcher’s interpretive work happens in building individual and meta-narratives, not in the prompting dialogue.

Reflexive TA with AI (Xu, Wheeler)

Sources: xu-ai-thematic-analysis-2026, wheeler-technological-reflexivity-2026

Rather than proposing a new framework, these papers demonstrate how to practice existing reflexive methods with AI assistance while maintaining the epistemological commitments of those methods.

Xu’s approach: AI as phase-by-phase assistant through Braun & Clarke’s six-phase reflexive TA. Researcher reflexivity documented at every phase. ChatGPT used for data familiarization support, initial theme generation (to be interrogated, not accepted), and cross-data pattern checking. The posthumanist framing positions AI as a thinking tool that extends researcher cognition rather than replacing it.

Wheeler’s technological reflexivity: Developed through empirical comparison of MAXQDA, NVivo, and ChatGPT on 1,300+ climate survey responses. Reflexivity is not just about the researcher’s relationship to data; it includes the researcher’s critical examination of how their tools shape what they see. Prompts are methodological choices. Platform affordances constrain analytic options. Distributed reflexivity: self, tool, context.

The automation spectrum

All frameworks can be positioned on a spectrum from fully human-led to fully AI-led:

Fully human-led ←──────────────────────────────────→ Fully AI-led

  Traditional          CAAI      CALM     CGT     AcademiaOS
  qualitative    GAITA   AbductivAI
  research        Reflexive TA
                  (Xu, Wheeler)

Consensus position: The interpretive tasks — developing coding schemes, deciding what themes mean, reflexively examining the analytic process — belong to the left side regardless of how much AI assists with mechanical tasks on the right.

The contested frontier: Whether AI can participate in developing interpretations (CAAI, AbductivAI) or only applying interpretations the human has developed (CALM, GAITA). This is the most active theoretical debate in the 2025–2026 literature.

What makes collaboration legitimate?

Synthesizing across the corpus, six conditions appear repeatedly as necessary (though not always sufficient) for legitimate human-AI collaboration:

Human scheme development — coding categories and interpretive frameworks are developed by the researcher, not generated autonomously by AI
Researcher immersion — the researcher has read sufficient data to evaluate AI output from a position of genuine interpretive competence (not just “paradigmatic cases”)
Direct validation — AI output is validated against human coding of a representative sample, not just correlated with external variables
Documented prompts — prompts are treated as methodological choices, documented in the audit trail, and reported in the paper
Reflexive memos — researcher documents how AI shaped the analytic process and where human judgment overrode AI output
Transparency about AI role — the specific task AI performed, the tool used, and the validation procedure are reported with enough detail for peer scrutiny

Failure modes

The corpus documents several patterns of illegitimate collaboration:

AI-led discovery without human immersion (CGT paradigm) — reading paradigmatic cases does not constitute qualification (carlsen-ralund-computational-grounded-theory-2022)
Unvalidated AI coding — high κ with prior human coding does not guarantee validity (validity-trustworthiness)
Quote acceptance without verification — AI fabricates quotes; 58% fabrication rate documented (jowsey-frankenstein-ai-ta-2025)
Opaque prompting — treating prompts as private workflow rather than methodological choices requiring documentation (wheeler-technological-reflexivity-2026)
Speed as success criterion — “28× faster” (prescott-ai-thematic-analysis-2024) ignores validation burden and quality trade-offs
Undisclosed use — dellafiore-et-al-2025-expert-interviews documents a culture of concealment: 13/14 expert researchers use AI but many initially presented as non-users; shame around disclosure produces a systematic under-reporting problem with governance implications. andrews-progress-or-perish-2026 names this “AI shaming” (Giray 2024) and documents it as cross-disciplinary; the qualitative methods community’s strong normative signaling around AI use may be contributing to the same dynamic it aims to prevent.

Empirical tests of framework typologies

ayik-et-al-2026-human-vs-ai-ta-tools provides the first empirical test of the framework taxonomy: four structurally distinct AI tools (ChatGPT-4o as unstructured assistant; QInsights as CAAI-implementation; ATLAS.ti AI as QDAS-integrated; MAXQDA AI Assist as QDAS-integrated) compared against validated human TA. Results:

Tools designed around dialogic principles (QInsights/MAXQDA) produce interpretivist-oriented output; tools with minimal structure (ChatGPT, ATLAS.ti) produce post-positivist output
Framework choice is partly made at tool selection
No hallucinations with explicit constraints — hallucination risk is prompt-design-dependent, not tool-inherent
MAXQDA (50% exact theme match) came closest; all tools diverged substantially from human TA in thematic structure