Children's Hospital Los Angeles — Office of Enterprise Data
March 2026

Clinical LLM Workflow Patterns

6 Architecture Patterns — From Simple Inference to Full Agentic Pipelines
Pattern Comparison
Pattern Complexity Cost Best For
1. Simple Inference Low $ One-off Q&A, summarization
2. RAG — Fabric Data Agents Medium $ Structured data retrieval
3. RAG — MCP Medium $$ SharePoint, policy docs, guidelines
4. Router Workflow Medium-High $ Multi-intent clinical chat
5. Agentic (LangGraph) Highest $$ NICCU birth abstraction, batch processing
6. Fine-Tuning High $$$ High-accuracy extraction at scale

1. Simple Inference

Direct prompt-to-response. Send a clinical question or note to an LLM endpoint and receive a structured answer. No retrieval, no fine-tuning — just a well-crafted system prompt and the model's general knowledge. Ideal for ad-hoc summarization, drafting, or quick classification.
StageDescriptionTools / ServicesCompute
Prompt User submits a question or clinical note with a system prompt Azure OpenAI GPT-4o
Foundry Qwen3-32B endpoint
Serverless
Pay-per-token or dedicated GPU
Inference Model generates response (JSON, narrative, or classification) Same endpoint ~1–5s latency
Response Structured output returned to user or downstream system REST API / SDK No extra cost
Flow — Simple Inference
User Prompt
Azure OpenAI (GPT-4o)
Response
User Prompt
Foundry Endpoint (Qwen3-32B)
Response

2. RAG — Fabric Data Agents

Natural-language-to-SQL retrieval using Microsoft Fabric Data Agents. The agent translates user questions into Warehouse SQL queries against structured clinical tables (MRN_Event_Mapping, Dx_Main_Module, Demo_Info_NICCU). No custom embeddings or vector store needed — Fabric handles the NL2SQL translation. Results are passed to an LLM for final answer generation.
StageDescriptionTools / ServicesCompute
1. User Query Natural-language question about patient data Chat UI / API No GPU
2. NL2SQL Fabric Data Agent translates query to Warehouse SQL Fabric Data Agent Fabric capacity
3. SQL Execution Query runs against Warehouse tables (MRN_Event_Mapping, Dx_Main_Module, Demo_Info_NICCU) Fabric Warehouse SQL Fabric capacity
4. LLM Generation Results + original question sent to LLM for answer synthesis Azure OpenAI GPT-4o Pay-per-token
5. Response Formatted answer returned to user Chat UI / API No extra cost
Flow — Fabric Data Agents (NL2SQL)
User Query
Fabric Data Agent (NL2SQL)
Warehouse SQL
LLM Generation
Response
Tables: MRN_Event_Mapping Dx_Main_Module Demo_Info_NICCU

3. RAG — MCP (Model Context Protocol)

Retrieval-Augmented Generation using MCP servers as the data bridge. An MCP client routes requests to specialized servers: fabric-lakehouse-warehouse for structured data (query_warehouse, query_lakehouse) and a SharePoint Indexer for policy documents and clinical guidelines. Documents are embedded with text-embedding-3-large, indexed in Azure AI Search, and retrieved at query time for LLM grounding.
StageDescriptionTools / ServicesCompute
1. User Query Natural-language question about policies, guidelines, or data Chat UI / API No GPU
2. MCP Client Routes request to appropriate MCP server based on intent MCP Client CPU
3. MCP Server Executes tool calls: query_warehouse, query_lakehouse, or search SharePoint index MCP fabric-lakehouse-warehouse
MCP SharePoint Indexer
CPU
4. Embedding + Search Documents chunked and embedded; vector similarity search retrieves relevant passages Azure OpenAI text-embedding-3-large
Azure AI Search
Pay-per-token (embeddings)
5. LLM Generation Retrieved context + user query sent to LLM for grounded response Azure OpenAI GPT-4o Pay-per-token
6. Response Answer with source citations returned to user Chat UI / API No extra cost
Flow — MCP-based RAG
User Query
MCP Client
MCP Server
Data Source
Embeddings
LLM
Response
MCP Tools: query_warehouse query_lakehouse SharePoint Indexer

4. Router-based Workflow

Multi-intent routing for clinical chat. A Router Node uses keyword heuristic matching with LLM fallback to classify user input into one of three paths: abstraction (chart abstraction pipeline), qa (general Q&A), or sql (database queries). Each path runs through its own node chain before converging at finalize and supervisor nodes for quality checks.
StageDescriptionTools / ServicesCompute
1. User Input Clinical question, abstraction request, or data query Chat UI / API No GPU
2. Router Node Keyword heuristic + LLM fallback classifies intent (abstraction, qa, sql) LangGraph Router Node
Azure OpenAI
1 LLM call (fallback only)
3a. Abstraction Path blob_lookup → pdf_extract → get_dx → compare Azure Blob Azure OpenAI Blob reads + LLM calls
3b. QA Path qa node — direct LLM answer with clinical context Azure OpenAI 1 LLM call
3c. SQL Path sql node — NL2SQL against Warehouse Fabric Warehouse Fabric capacity
4. Finalize Format and validate response from any path LangGraph Finalize Node CPU
5. Supervisor Quality gate — review, approve, or request retry LangGraph Supervisor Node 1 LLM call
Flow — Router Workflow (3 branches)
User Input
Router Node
Abstraction
blob_lookup
pdf_extract
get_dx
compare
finalize
supervisor
QA
qa
finalize
supervisor
SQL
sql
finalize
supervisor

5. Pro-code Agentic — LangGraph

Full agentic pipeline for NICCU birth abstraction. Built with LangGraph for deterministic, observable workflows. Supports two modes: Interactive (single-MRN, real-time via /chat endpoint) and Batch (multi-MRN fan-out with semaphore concurrency via /batch endpoint). Each MRN runs through the complete node chain: routing, blob lookup, PDF extraction, diagnosis comparison, and report generation. Batch mode produces a downloadable report via Azure Blob Storage.
StageDescriptionTools / ServicesCompute
1. Input Interactive: single MRN via chat
Batch: CSV upload with MRN list
LangGraph FastAPI endpoints CPU
2. Router Classify intent and route to abstraction pipeline LangGraph Router Node Heuristic + LLM fallback
3. Blob Lookup Find clinical note PDF in Azure Blob by MRN Azure Blob hp-notes container No GPU
4. PDF Extract Extract text from clinical note PDF using pdfplumber pdfplumber CPU
5. Get Dx LLM extracts structured diagnosis data from note text Azure OpenAI GPT-4o 1–2 LLM calls
6. Compare Compare extracted data against reference (Warehouse / dx-data) Fabric Warehouse CPU
7. Finalize Format output row, validate fields LangGraph Finalize Node CPU
8. Supervisor Quality gate — approve or retry LangGraph Supervisor Node 1 LLM call
9. Report (Batch) Fan-in results, generate report, upload to Azure Blob Azure Blob abstraction-reports CPU
Flow — Interactive Mode (single MRN)
User Input
Router
blob_lookup
pdf_extract
get_dx
compare
finalize
supervisor
Response
Flow — Batch Mode (multi-MRN)
CSV Upload
Job Created
Fan-out (sem=5)
Per-MRN Pipeline (concurrent)
blob_lookup
pdf_extract
get_dx
compare
finalize
supervisor
Fan-in
Report Node
Azure Blob Upload
Download URL

6. Fine-Tuning Pipeline

Train a domain-specific model on labeled clinical examples. Produces a custom checkpoint that outperforms general-purpose models on structured extraction tasks. Requires labeled data, GPU compute for training, and a dedicated inference endpoint. Best when you have 100–500+ labeled examples and need high accuracy at scale.
StageDescriptionTools / ServicesCompute
1. Data Collection Gather clinical notes + gold-standard labels from Fabric Lakehouse Fabric Lakehouse / Warehouse No GPU
Fabric F-SKU
2. Labeling Subject-matter experts annotate examples (structured JSON labels) Manual review / labeling tool Human effort
3. JSONL Prep Spark notebook formats labeled data into OpenAI chat-format JSONL Fabric Spark Notebook No GPU
CPU Spark cluster
4. Upload JSONL uploaded to Azure ML / Foundry Data + Indexes Foundry Data assets No GPU
5. QLoRA Fine-tune Supervised fine-tuning job. QLoRA on 100–500 examples per task Foundry Fine-tuning Job 4× A100
$32/hr, 2–4 hrs/run
6. Deploy Endpoint Managed online endpoint with scale-to-zero, OpenAI-compatible API Foundry Managed Endpoint 1× A100
$3.67/hr active, $0 idle
7. Inference Call fine-tuned model for extraction. JSON structured output REST API (OpenAI-compat) Same endpoint
Flow — Fine-Tuning Pipeline
Fabric Lakehouse
Labeling
Spark JSONL Prep
Upload to Foundry
QLoRA Fine-tune (4×A100)
Deploy Endpoint (1×A100)
Inference
Model Cards
Text (5 projects): Qwen3-32B
Vision (Xray): Qwen2.5-VL-32B / GPT-4o