Pydantic Output Parsing for LLMs
Using PydanticOutputParser to enforce typed, validated JSON from LLM responses — turning probabilistic text generation into reliable structured data at pipeline boundaries.
The Problem
LLMs return text. Pipelines need data. The gap between those two is where things break.
Ask a model to return JSON and it will — most of the time. Sometimes it wraps it in markdown code fences. Sometimes it adds a preamble (“Here is the JSON you requested:”). Sometimes it renames a field. Sometimes it returns valid JSON that doesn’t match your schema. None of these are predictable, and all of them crash a pipeline that was expecting a clean dict.
PydanticOutputParser turns the schema definition into part of the prompt, and then validates the response against it before your code ever sees it.
The Setup
Define the schema as a Pydantic model:
from pydantic import BaseModel, model_validator
from typing import List, Optional
class Question(BaseModel):
question_number: Optional[str] = None
question_letter: Optional[str] = None
question_text: str
marks: int
@model_validator(mode="after")
def check_question_identifier(cls, values):
number = values.question_number
letter = values.question_letter
if (number is None and letter is None) or (number is not None and letter is not None):
raise ValueError(
"A question must have either question_number or question_letter, but not both."
)
return values
class ExtractionOutput(BaseModel):
semester: Optional[str]
course: Optional[str]
subject: Optional[str]
paper_code: Optional[str]
scheme: Optional[str]
total_marks: Optional[int]
sections: List[Section]
Wire up the parser:
from langchain.output_parsers import PydanticOutputParser
extraction_output_parser = PydanticOutputParser(pydantic_object=ExtractionOutput)
The parser injects format instructions into the prompt automatically. The chain:
chain = extract_exam_data_prompt | gpt4_turbo_llm | extraction_output_parser
result: ExtractionOutput = chain.invoke({"paper_text": state["paper_text"]})
result comes out as a typed ExtractionOutput instance, not a raw dict. Downstream code accesses result.sections, result.semester — IDE autocomplete, type checking, the works.
The model_validator Pattern
The Question model’s validator enforces a domain rule that a schema type alone can’t express: each question is identified by either a number (for regular questions) or a letter (for case study sub-questions) — never both, never neither.
@model_validator(mode="after")
def check_question_identifier(cls, values):
number = values.question_number
letter = values.question_letter
if (number is None and letter is None) or (number is not None and letter is not None):
raise ValueError("...")
return values
Without this, the model would happily return a question with both fields filled in, or neither — and the downstream note generator would have to handle those cases defensively everywhere. Pushing the constraint into the Pydantic model centralizes it. If the LLM violates it, the parser raises before the bad data propagates.
One Parser Per Stage
Each pipeline stage has its own output model:
class ContextFlowOutput(BaseModel):
topic: str
key_concepts: List[str]
note_help: str
class FirstPassNoteFlowOutput(BaseModel):
first_pass_note: str
class TeacherEvaluateFlowOutput(BaseModel):
final_answer: str
Even for single-field outputs like FirstPassNoteFlowOutput, wrapping in a model is worth it. The parser validates that the field exists and isn’t null. Without it, a model that returns {"first_pass_note": null} because it had nothing to say would pass silently and produce an empty note.
What to Watch
Retries on parse failure. The parser raises OutputParserException when it can’t parse the response. LangChain has a RetryWithErrorOutputParser that feeds the error back to the model and asks it to try again. For production pipelines, wire this up — particularly for stages where the prompt is complex and the model occasionally drifts.
Format instructions bloat. The injected format instructions add tokens to every request. For a simple one-field output, the format instruction overhead can exceed the actual content. For those cases, consider JsonOutputParser (less strict) or structured output via the model’s native function/tool-calling API instead.
Pydantic v1 vs v2. LangChain’s PydanticOutputParser historically expected Pydantic v1 models. If you’re on Pydantic v2, use mode="after" in validators (as above) and check which version LangChain is configured for. Mixing v1 validators with v2 runtime causes subtle failures.
What’s Next
- Switch extraction to OpenAI’s native structured output / function-calling API — it’s more reliable than asking the model to produce JSON via prompt instructions, since the tokenizer is constrained to valid JSON by default
- Add
OutputFixingParseras a fallback for the generation stages — these produce longer outputs that are more prone to formatting drift