Reeping et al. (2025) — Interrogating the Use of LLMs in Qualitative Research Using the Q3 Framework

Source
url	https://doi.org/10.21061/see.174
raw	raw/Reeping_6867ce179427a.pdf

TL;DR: Applies the Qualifying Qualitative Research Quality (Q3) framework — a process-oriented quality framework developed for engineering education research — to systematically evaluate LLMs across multiple quality dimensions. Key finding: the quality concerns LLMs introduce are not new, but the scale and opacity are. New forms of reflexivity specifically around AI positionality are required.

Problem

Engineering education research has developed its own quality frameworks for qualitative inquiry, recognizing that generic qualitative standards need adaptation for the specific epistemological and methodological norms of the field. When LLMs arrived, the field faced a question that most adjacent literatures had not addressed systematically: how do LLMs affect each dimension of an established quality framework, and what does that imply for practice?

The broader problem is that much of the AI-TA literature addresses quality concerns in an ad hoc way — flagging issues like hallucination, bias, or lack of transparency as they arise rather than mapping them against a systematic quality framework. Reeping et al. provide a more structured analysis: by working through each Q3 dimension, they identify precisely which quality concerns LLMs affect, how they affect them, and what mitigation strategies are appropriate.

The paper is addressed primarily to engineering education researchers, but the Q3 framework analysis generalizes readily to other fields that have adopted process-oriented quality frameworks for qualitative research.

Approach

The Q3 Framework (Walther et al. 2013) evaluates qualitative research quality across multiple dimensions: credibility, transferability, dependability, confirmability, and integrity. Reeping et al. use this as an analytical lens, mapping LLM involvement onto each dimension to identify opportunities and concerns.

Credibility — whether findings accurately represent participants’ perspectives. LLMs introduce threats through hallucination (fabricated evidence that appears credible), pattern-flattening that may miss individual voices, and the opacity of how outputs are generated. Mitigation: member checking, triangulation, and documented validation procedures.

Transferability — whether findings can be applied to other contexts. LLMs trained on broad datasets may introduce patterns from outside the research context — effectively contaminating the local analysis with external associations. Mitigation: thick description of context and explicit documentation of AI involvement.

Dependability — whether findings would be consistent if the study were replicated. LLMs are nondeterministic: the same prompt produces different outputs across runs. This is a fundamental challenge for dependability that cannot be fully resolved, only managed through multiple-run documentation.

Confirmability — whether findings reflect participant data rather than researcher bias. LLMs introduce a new form of “bias” — the training data bias embedded in the model’s associations — that is distinct from researcher bias and cannot be addressed through traditional reflexivity. This is where new forms of reflexivity are required.

Integrity — whether ethical standards are maintained throughout. This dimension covers data privacy, consent, transparency in reporting, and appropriate acknowledgment of AI’s role — connecting directly to ai-research-ethics.

AI’s Role

AI appears in this paper as the object of quality analysis rather than a research tool. The paper evaluates what LLMs do to research quality rather than how to use them well. This meta-level perspective is valuable precisely because it is not invested in making the case for AI adoption.

The paper’s conclusion is measured: LLMs create challenges at every Q3 dimension, but those challenges are not categorically different from challenges that CAQDAS has always posed. What is new is scale and opacity — LLMs operate with less transparency and at greater scope than traditional software tools, amplifying existing concerns rather than creating entirely new ones.

Epistemological Stance

Interpretivist / process-oriented, within the engineering education research tradition. The Q3 framework itself is explicitly process-oriented: it evaluates not just whether findings are credible but whether the research process that produced them was rigorous. This process focus aligns naturally with concerns about AI involvement — the question is not just whether AI-assisted findings are accurate but whether the process that generated them is defensible.

The paper takes seriously the interpretivist concern that the researcher is the primary instrument in qualitative inquiry. LLMs do not share the researcher’s contextual understanding, relational knowledge, or reflexive capacity — and these deficits matter for quality, not just efficiency.

Rigor and Trustworthiness

The Q3 framework provides systematic structure that prevents the paper from degenerating into an undifferentiated list of concerns. Each quality dimension receives specific analysis, and the mitigations proposed are connected to the specific concern they address.

The paper’s argument that LLM quality concerns are “by no means new” is both reassuring and provocative: reassuring because existing frameworks can handle them, provocative because it suggests that the field should have been asking these questions about CAQDAS all along.

Limitations

The Q3 framework is specific to engineering education research, though Walther et al.'s dimensions draw on Lincoln and Guba’s widely used trustworthiness criteria. Researchers in other fields may need to translate the framework into their own quality vocabulary.

The paper is more successful at identifying concerns than at providing detailed mitigation procedures. Knowing that LLMs threaten dependability because of nondeterminism is useful; knowing precisely how to document multiple-run variation to satisfy dependability requirements would be more useful still.

The paper does not provide empirical evidence of Q3 dimension failures in actual LLM-assisted engineering education research — the analysis is theoretical. Empirical documentation of quality failures specific to the field would strengthen the argument.

Connections

llm-qualitative-research — broader landscape; this paper applies a formal framework to the field-level debate
ai-research-ethics — the integrity dimension directly connects to the ethics literature
wheeler-technological-reflexivity-2026 — parallel argument that all tools mediate analytic decisions; Reeping et al. ground this in a formal quality framework
brailas-ai-qualitative-research-2025 — philosophical grounding for the AI positionality reflexivity argument; Reeping et al. operationalize it within the Q3 framework
validity-trustworthiness — the paper is the most systematic quality-framework analysis in the corpus
jowsey-frankenstein-ai-ta-2025 — empirical evidence of quality failures that the Q3 analysis predicts theoretically
intercoder-agreement — the dependability and confirmability dimensions connect to reliability concerns across the corpus