Source
TL;DRPrompt engineering is the craft of writing instructions that reliably elicit the desired output from an LLM — a non-trivial skill that can make or break LLM-assisted research tasks.

What it means

LLMs interpret natural language instructions, but small changes in wording, structure, or context can substantially affect output quality and consistency. Prompt engineering is the iterative process of designing, testing, and refining those instructions.

For research applications like llm-qualitative-research, good prompts:

  • Specify the task precisely (what to look for, how to format output)
  • Include relevant synonyms and examples to enrich semantic context
  • Set constraints (e.g., word limits for category labels, exclusivity rules)
  • Use step-by-step instructions (chain-of-thought) for complex reasoning tasks

In qualitative research

(bijker-chatgpt-qca-2024) demonstrates how thorough prompt engineering is necessary — but not sufficient — for reliable LLM-assisted coding. Their prompts went through multiple iterations before reaching acceptable performance. Key lessons:

  • Richer labels → better consistency. Category labels with examples improved intercoder-agreement because they gave the model more semantic anchors.
  • Explicit constraints help but don’t guarantee compliance. Even with instructions to code into a single best-matching domain, ChatGPT sometimes assigned data to multiple domains.
  • Prompts generalize across topics. Their final prompts can be adapted for new research areas by substituting topic keywords — but researchers should verify LLM familiarity with any theory or framework used.

Practical techniques

  • Chain-of-thought prompting: ask the model to reason step-by-step before giving a final answer
  • Few-shot examples: include 2–3 examples of desired input/output in the prompt
  • Synonym enrichment: use varied terminology to improve recall on diverse phrasings
  • Iterative refinement: run the prompt, evaluate output, adjust instructions, repeat

Techniques specific to interpretivist and Big-Q research

wise-et-al-2026-ai-not-the-enemy proposes prompting practices designed not for reliability but for interpretive depth — specifically to support the five qualitative commitments their AI-in-the-loop framework maps to LLM properties:

  • Persona prompting for positionality: instruct the model to adopt different analytic stances (“read this as a researcher concerned with power relations”; “now read as a researcher looking for counter-narratives”). Operationalizes positionality as analytic resource rather than source of bias to be controlled.
  • Systematic disconfirmation prompting: explicitly ask the model to find passages that contradict, complicate, or resist the emerging interpretation. Directly addresses epistemic-flattening — the model’s tendency toward statistically probable outputs.
  • Temporal audit prompts: run the same prompts at multiple stages of analysis and document changes. Supports dependability: the audit trail shows how interpretations evolved over time.
  • Underrepresented voice search: ask the model to identify voices, perspectives, or patterns that appear infrequently but deserve closer attention. Counteracts the dominant-pattern bias.

Critical requirement (Wise et al.): these techniques require the full corpus in the model’s active context — not RAG (retrieval-augmented generation), which selectively retrieves fragments. Long-context models (128K–1M tokens) make full-corpus prompting feasible. Selective retrieval reintroduces the sampling bias these techniques are designed to prevent.

Limits

Prompt engineering cannot fully compensate for inherent model limitations: overlapping theoretical frameworks, ambiguous constructs, or data that simply doesn’t map cleanly to predefined categories will produce inconsistent output regardless of prompt quality.

See also