A Case Study on Using AI to Analyze Qualitative Interview Transcripts

Claire Kelley; Deja Logan; Bonnie Solomon

A Case Study on Using AI to Analyze Qualitative Interview Transcripts

Research BriefArtificial IntelligenceAug 26, 2025

facebook

X

Bluesky

Authors

This case study describes using AI to conduct qualitative analysis of semi-structured interview transcripts. The interviews were with women who had recently received sexual and reproductive health care services. Our analysis used both deductive and inductive methods to identify key themes in respondents’ answers about their experiences.

As AI-assisted qualitative analysis evolves, there is increasing consensus that AI is best positioned to augment—not replace—traditional qualitative methods. Our study supports this conclusion. Ongoing research is needed to better understand when and how large language models (LLMs) can be most effectively integrated into qualitative workflows, including what balance of human and machine input supports rigor, preserves interpretive depth, and aligns with ethical standards.

In this case study, we first provide context for our original study and some background on the current use of AI in qualitative analysis. Next, we describe our approach to using AI for qualitative analysis and include some example results from our study. We conclude with lessons learned that we hope can inform other researchers.

Our case study is based on the project, which aimed to help the family planning field improve clients’ experiences of care at Title X-funded health centers by identifying innovative, effective strategies for service delivery and developing actionable recommendations. To identify strategies for improving family planning services, the research team conducted mixed-methods research with family planning clients and providers.

In Spring 2025, women with low incomes across the United States who had received sexual and reproductive health care (including family planning services) in the past 12 months participated in 60-minute semi-structured virtual interviews. We conducted interviews until saturation was reached, which produced 17 final interviews that we recorded and transcribed using Microsoft Teams. Transcripts were de-identified prior to analysis. We chose to use AI to support the rapid analysis of interviews to save time while fostering an innovative approach. The use of AI also helped reduce the time needed between data collection and reporting.

We used the following research questions to guide this analysis:

What interpersonal and logistical experiences do patients associate with positive sexual and reproductive health care (including family planning) visits?
What interpersonal and logistical experiences do patients associate with negative sexual and reproductive health care visits?
What do patients recommend to ensure good, respectful sexual and reproductive health care experiences?

AI-assisted qualitative analysis is an emerging field shaped by both excitement and critical reflection. Commercial platforms such as NVivo AI, Dovetail, and Quirkos are incorporating LLMs into their toolkits, which can help researchers conduct initial data exploration and code large volumes of qualitative data. However, important concerns accompany these advances.

Serious risks to data security and privacy are introduced when identifiable or sensitive transcripts are uploaded into third-party systems without appropriate safeguards. In terms of analytic rigor, while AI can support consistent and efficient coding, it lacks the contextual understanding needed for more interpretive analysis—especially when data involve humor, sarcasm, or coded language, which may reflect the cultural context or lived experiences of marginalized groups and can be easily misinterpreted. Addressing these risks requires both secure processing environments that protect participant data and workflows that integrate human expertise to ensure appropriate, nuanced interpretation. Together, these safeguards help maintain the ethical and methodological standards essential to high-quality qualitative research.

A growing number of validation studies have assessed how well LLMs perform in qualitative coding tasks, often comparing machine-generated code applications to those produced by human analysts. Results have been mixed: While LLMs may offer consistency and efficiency, researchers have raised concerns about the interpretive validity of machine-generated themes, particularly when nuanced understanding or theoretical grounding is required. Notably, few studies have rigorously evaluated hybrid approaches that combine AI tools with human oversight, despite growing real-world use of these models in practice. These hybrid approaches vary in structure, with some taking a human-in-the-loop form—where researchers lead the analysis and AI provides support—and others adopting an LLM-in-the-loop model, in which AI generates initial outputs that are then reviewed or refined by humans.

Our Approach

In the context of our study, we chose to use AI-assisted analysis to rapidly code transcripts from deidentified interview data. For this analysis, we used an AI-assisted inductive approach to identify themes, using Google Gemini 2.4-Pro. To protect privacy, we removed all identifying information from transcripts (such as names, locations, or clinic names) and gained IRB approval before conducting any analysis. This model was chosen because it performs well on tests of LLM benchmarks and has shown particular promise for inductive coding. Our approach to AI-assisted qualitative analysis included the following key steps:

The study team de-identified transcripts.
We split transcripts so that each individual response (described as a quote in the AI prompt) could be analyzed as a separate unit.
Next, we identified which questions from the interview to use to answer each research question and ran the analysis separately for each research question.
The study team developed the context-setting prompt to include a brief description of the study and information on the research questions we hoped to answer.
We ingested each individual response into the LLM, which used an inductive approach to identify themes.
- For the first response, the model generated a new theme.
- For each subsequent response, the model evaluated whether the existing list of themes already captured the response. If not, it generated a new theme. This allowed the theme list to evolve iteratively based on the content of each new response.

Example prompt (step 5)

You are a helpful research assistant working on a study designed to understand patients’ experiences with sexual and reproductive health care and identify patient recommendations. We are attempting to answer a research question:

What experiences do patients associate with positive sexual and reproductive health care visits? [note that the prompt was varied to include each research question separately during analysis, depending on the research question being answered]

The quote you are about to analyze is a response to a question from a patient during a semi-structured interview. Can you identify the code of this quote in a three-word phrase, using (if possible) this existing list of themes? If the list is empty or if no theme from this list represents the quote well, you can create a new theme name. The quote is: [INSERT QUOTE HERE]

The study team added nuance to AI-generated themes by separating themes that were too broad or nonspecific and adding clarity to the definitions to make them more specific. We then used this edited list of themes as the final themes for deductive coding.
We deductively coded all individual responses again using these final themes.
- Specific instructions were provided in a subsequent prompt allowing a quote to be coded as representative of more than one theme.
The study team then manually reviewed a subset of responses coded by the AI to ensure they were accurately coded.
Next, we used the AI model to provide a summary of findings and identify representative quotes for each finalized theme.
Finally, the study team manually reviewed these results and provided interpretation and contextualization. We will publish full results in a separate brief on the Child Trends website.

We found this multi-stage process to be necessary because, in preliminary testing, AI-generated themes were vague or insufficiently supported by evidence from the data. By iteratively coding individual interview question responses and then generating summaries, the model performance improved noticeably.

Example Results

The table below highlights two themes—one related to interpersonal experiences and one related to structural or logistical aspects of care—developed through our broader AI-assisted, researcher-refined analysis. While these examples reflect the final outputs, they represent an iterative process through which we refined both the themes and their definitions for clarity, specificity, and relevance.

Table 1: Examples of AI-generated Theme Definitions and Summaries of Findings

Theme	AI-generated Definition	Summary of Findings
Empathetic and validating care	“Empathetic and validating care" describes patient experiences in which providers showed genuine understanding and respect for their concerns. It encompasses interactions wherein patients felt heard and believed, leading to a strong sense of trust and comfort. This theme also encompasses how providers offered emotional support and tailored treatments to individual needs, which significantly enhances patients’ experience.	Patients overwhelmingly valued empathetic and validating care, emphasizing the importance of providers who actively listened to and took their concerns seriously. Feeling "heard" and believed was crucial for patients and led to a sense of trust and comfort. Providers who tailored treatments to individual needs and offered emotional support were highly praised. Conversely, patients cited dismissive or rushed interactions as significant negative experiences. This highlights the profound impact of compassionate care on patient satisfaction and the importance of building trusting doctor-patient relationships.
Comfortable physical environment	"Comfortable physical environment" refers to how the clinic’s physical setting impacts the patient's experience and sense of well-being. This theme encompasses physical features such as cleanliness, lighting, noise levels, and privacy that create a positive atmosphere.	Interview participants consistently highlighted the importance of a comfortable health care environment. Features like dim lighting, quiet spaces, and private exam rooms were frequently cited as contributing to a positive experience. Clients also mentioned cleanliness and a lack of overcrowding in waiting areas as crucial for their comfort. Conversely, bright fluorescent lights, cramped waiting rooms, and loud noises were associated with negative experiences. Creating a relaxing and welcoming atmosphere was deemed essential for reducing patient anxiety and improving overall satisfaction.

These two themes are part of a larger set developed through our hybrid analytic approach, which combined elements of both human-in-the-loop and LLM-in-the-loop models: Researchers led the design and interpretation, while the AI generated initial outputs that the study team iteratively reviewed and refined to ensure alignment with the data and research questions.

In the full set of analytic outputs, each table also included illustrative quotes and a list of the interview transcripts they were drawn from, which supported interpretation and helped ground findings in participants’ words.

The complete findings—including all refined themes, summaries, and illustrative quotes—will be synthesized in a forthcoming brief on the Child Trends website.

Below, we highlight key lessons learned from this process, with a focus on how we balanced AI assistance with human expertise to produce rigorous, actionable findings.

Lessons Learned

Our team’s experience using AI in this project provided several key insights about how to structure prompts, validate results, and refine AI outputs to ensure meaningful and trustworthy findings. These takeaways may help other researchers looking to integrate AI into qualitative analysis.

Breaking the analysis into distinct phases helped improve the clarity and usefulness of AI-generated results. We found that separating the process into two stages—first using AI to inductively generate candidate themes, then applying those themes deductively to the data—produced more relevant and interpretable findings. This structure allowed us to make targeted refinements at each step and maintain greater control over how we developed and applied themes.
Human review and refinement were also critical to ensuring clarity and relevance. While the AI generated an initial set of themes related to positive interpersonal experiences (e.g., positive staff interaction; empathetic, validating care; proactive, empathetic care; safe, comfortable environment; positive community support), the research team identified the following issues during our review:
- Overlap: We consolidated themes that included similar or redundant ideas (e.g., “empathetic” appeared in multiple themes).
- Over-complexity: For clarity, we split themes that combined multiple concepts (e.g., “proactive, empathetic care”).
- Low-frequency or tangential content: We excluded themes like “positive community support” that appeared infrequently and were not central to clients’ experiences.
- Category clarity: We reclassified some themes (e.g., “safe, comfortable environment”) that described logistical or environmental factors, rather than interpersonal ones.
- Vagueness: Some themes, such as “positive staff interaction,” were overly broad and more reflective of the overarching category than discrete experiences. We refined these into more specific, analyzable themes.

This hands-on review process helped ensure that themes were analytically meaningful and closely aligned with our research questions.

Supplying study context in AI prompts improved the relevance and alignment of results. By including brief, tailored context—such as a description of the study’s purpose and specific research questions—we helped the AI generate themes that better aligned with our analytic goals. This context-setting exercise helped the model stay focused on client experiences, rather than producing vague or generic themes.
Making smaller, more focused AI requests reduced hallucinations—instances where the model generated incorrect or misleading information—and improved accuracy. We observed fewer factual errors when we asked the AI to analyze short, specific inputs—particularly single quotes—instead of longer segments like full transcripts. Small, concrete prompts helped the model stay grounded in participants’ words and reduced the risk of misinterpretation.

AI-assisted qualitative analysis represents a promising means to accelerate research timelines, but meaningful human involvement remains essential throughout the process. In this study, subject matter experts and qualitative researchers played critical roles in shaping relevant categories, ensuring interpretive rigor, and grounding AI-generated findings with context and nuance.

Acknowledgements

We thank Jennifer Manlove, Kate Welti, and Samantha Holquist for their review of this brief; Brent Franklin for his editorial support; Catherine Nichols for her design work; and Ria Shelton for their research assistance. Lastly, we would like to thank our project officer, Callie Koesters, for her leadership across all aspects of this project.

This publication was supported by the Office of Population Affairs (OPA) of the U.S. Department of Health and Human Services (HHS) as part of a financial assistance award totaling $1,548,353 with 100 percent funded by OPA/OASH/HHS. The contents are those of the author(s) and do not necessarily represent the official views of, nor an endorsement, by OPA/OASH/HHS, or the U.S. Government. For more information, please visit https://opa.hhs.gov/.

Suggested citation

Kelley, C., Logan, D., & Solomon, B. (2025). A case study on using AI to analyze qualitative interview transcripts. Child Trends. DOI:10.56417/2526z8326s

Artificial Intelligence Health Sexual & Reproductive Health

A Case Study on Using AI to Analyze Qualitative Interview Transcripts

Authors

Study Context

State of the Field for AI-assisted Qualitative Analysis

Our Approach

Example Results

Table 1: Examples of AI-generated Theme Definitions and Summaries of Findings

Lessons Learned

Acknowledgements

Suggested citation