A flexible framework for qualitative text analysis and coding using large language models (LLMs).
This repository contains a reusable framework for applying deductive coding to text data through LLM-based classifiers. The project implements a modular pipeline architecture that supports both single-stage and multi-stage classification approaches, with integration for multiple LLM backends.
The framework is designed for systematic qualitative analysis of text data where:
- You have predefined coding schemes or categories
- You need to process large volumes of text data
- You want to leverage LLMs for consistent application of coding rules
- You need robust checkpointing for long-running processes
The system is designed as a modular pipeline with the following key components:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Data Loading │────▶│ Classification │────▶│ Analysis and │
│ & Processing │ │ Engine │ │ Reporting │
│ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
The pipeline is designed to be flexible and support most potential classification strategies. Two example classification approaches have been implemented:
- Simple Classifier: Single-stage approach that directly classifies text based on your coding scheme
- Two-Stage Classifier:
- First stage identifies evidence related to your coding categories
- Second stage scrutinizes that evidence for validity
- Checkpoint System: Resume interrupted processing from the last saved state
- Batch Processing: Efficiently process large datasets in manageable chunks
- Flexible Pipeline: Configure multi-stage processing with different models
- Robust Error Handling: Graceful recovery from API failures and timeouts
- Comprehensive Logging: Track model inputs, outputs, and parameters
The core library (src/
) provides a flexible and reusable system for LLM-based text classification:
-
src/classifiers/
: Classification implementationssimple/
: Single-stage classifier that directly outputs labelstwo_step/
: Two-stage classifier with evidence gathering and scrutiny phases
-
src/core/
: Core infrastructurepipeline/
: Pipeline architecture for managing classification workflowscheckpointing/
: Checkpoint system for resuming interrupted processes
-
src/data/
: Data handling- Data types for narratives and text segments
- Loaders for processing input data
-
src/llm_endpoints/
: LLM integrations- Support for local llama.cpp
- OpenAI API integration
- Together AI integration
This repository includes the complete codebase for our recent research paper on vulnerability classification in police reports. Our research examines how LLMs can be used to identify vulnerable populations in police records, with careful consideration of potential demographic biases. We applied the framework to classify four key vulnerability factors in police incident narratives:
- Mental health issues
- Drug abuse
- Alcoholism
- Homelessness
The associated paper compares the performance of LLM classifications with human labellers, and explores counterfactual narratives where only demographic characteristics are varied to test for bias in classification outcomes. This real-world research application demonstrates how the framework can be effectively applied to sensitive text analysis tasks requiring careful scrutiny of evidence and consideration of potential biases.
Key research components include:
- Comparison of different LLM architectures (Meta-Llama-3.1-8B, Meta-Llama-3.1-70B, GPT-4o)
- Evaluation of different prompt engineering strategies (custom vs. codebook)
- Analysis of classification error patterns and demographic biases
Our preliminary analysis has shown:
- Significant variations in classification performance across different vulnerability types
- Evidence of demographic biases in some classification contexts
- Improvement in classification accuracy when using a two-stage approach
- Variations in performance between different LLM sizes and architectures
For detailed findings, please refer to the paper replication code and our manuscript.
The repository includes code to replicate our research findings:
-
boston_fio_paper/
: Scripts related to our analysis of Boston Field Interrogation and Observation (FIO) dataanalyse_counterfactuals.Rmd
: R analysis of counterfactual narrativesclassify_narratives.ipynb
: Classification pipeline executiondownload_and_preprocess_fio_data.py
: Data preparationgenerate_counterfactuals.ipynb
: Generation of counterfactual narratives
-
experiments/
: Experimental notebooksearly_testing.ipynb
: Initial classifier testingsimple_classifier.ipynb
: Simple classifier implementationtwo_stage_classifier.ipynb
: Two-stage classifier implementation
For detailed information about replicating our paper results, please see the README in the boston_fio_paper/
directory.
If you use this code in your research, please cite our paper:
@article{author2023llm,
title={Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives},
author={Relins, S. and Birks, D and Lloyd, C},
journal={Arxiv Preprint},
year={2023},
note={Currently under review for the Journal of Quantitative Criminology}
}
-
Clone the repository:
git clone https://github.com/yourusername/llm-deductive-coding.git cd llm-deductive-coding
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
from src.core.pipeline.pipeline import ClassificationPipeline
from src.core.pipeline.types import PipelineConfig, PipelineStep
from src.classifiers.simple.classifier import get_classifications
from src.llm_endpoints.llama_cpp import llama_cpp_endpoint
import pandas as pd
# Load your data
df = pd.read_csv("your_data.csv")
# Define your classification step
classification_step = PipelineStep(
name="classification",
processor_fn=get_classifications,
fn_args={
'system_prompt': "Your prompt here...",
'endpoint': llama_cpp_endpoint,
'n': 3 # Number of classifications per segment
},
)
# Create and run pipeline
pipeline = ClassificationPipeline(
steps=[classification_step],
config=PipelineConfig(
batch_size=10,
checkpoint_dir="classification_checkpoints"
)
)
# Run classification
results = pipeline.run(df, "narrative_column", "id_column")
from src.classifiers.two_step.classifier import get_evidence, scrutinise_evidence
# Define evidence gathering step
evidence_step = PipelineStep(
name="evidence",
processor_fn=get_evidence,
fn_args={
'system_prompt': "Your evidence gathering prompt...",
'endpoint': llama_cpp_endpoint,
'n': 10
},
)
# Define scrutiny step
scrutiny_step = PipelineStep(
name="scrutiny",
processor_fn=scrutinise_evidence,
fn_args={
'system_prompt': "Your evidence scrutiny prompt...",
'endpoint': llama_cpp_endpoint,
'n': 1
},
)
# Create and run two-stage pipeline
pipeline = ClassificationPipeline(
steps=[evidence_step, scrutiny_step],
config=PipelineConfig(
batch_size=10,
checkpoint_dir="two_stage_checkpoints"
)
)
results = pipeline.run(df, "narrative_column", "id_column")
The framework supports multiple LLM backends:
-
Local Llama.cpp Server: For self-hosted models
from src.llm_endpoints.llama_cpp import llama_cpp_endpoint
-
OpenAI API: For accessing GPT models
from src.llm_endpoints.openai import openai_endpoint
-
Together AI: For a range of open models
from src.llm_endpoints.together import together_endpoint
This project is licensed under the MIT License.
Feel free to fork this repository and adapt it for your own needs. While we may not actively maintain this as a community project, we welcome researchers to build upon our work for further investigations into LLM-based classification of sensitive text. If you use or modify this codebase for your research, please cite our paper as referenced in the Citation section above.
For questions about the code or research, please open an issue or contact the authors.