| Pattern | Complexity | Cost | Best For |
|---|---|---|---|
| 1. Simple Inference | Low | $ | One-off Q&A, summarization |
| 2. RAG — Fabric Data Agents | Medium | $ | Structured data retrieval |
| 3. RAG — MCP | Medium | $$ | SharePoint, policy docs, guidelines |
| 4. Router Workflow | Medium-High | $ | Multi-intent clinical chat |
| 5. Agentic (LangGraph) | Highest | $$ | NICCU birth abstraction, batch processing |
| 6. Fine-Tuning | High | $$$ | High-accuracy extraction at scale |
| Stage | Description | Tools / Services | Compute |
|---|---|---|---|
| Prompt | User submits a question or clinical note with a system prompt | Azure OpenAI GPT-4o Foundry Qwen3-32B endpoint |
Serverless Pay-per-token or dedicated GPU |
| Inference | Model generates response (JSON, narrative, or classification) | Same endpoint | ~1–5s latency |
| Response | Structured output returned to user or downstream system | REST API / SDK | No extra cost |
| Stage | Description | Tools / Services | Compute |
|---|---|---|---|
| 1. User Query | Natural-language question about patient data | Chat UI / API | No GPU |
| 2. NL2SQL | Fabric Data Agent translates query to Warehouse SQL | Fabric Data Agent | Fabric capacity |
| 3. SQL Execution | Query runs against Warehouse tables (MRN_Event_Mapping, Dx_Main_Module, Demo_Info_NICCU) | Fabric Warehouse SQL | Fabric capacity |
| 4. LLM Generation | Results + original question sent to LLM for answer synthesis | Azure OpenAI GPT-4o | Pay-per-token |
| 5. Response | Formatted answer returned to user | Chat UI / API | No extra cost |
fabric-lakehouse-warehouse for structured data (query_warehouse, query_lakehouse) and a SharePoint Indexer for policy documents and clinical guidelines. Documents are embedded with text-embedding-3-large, indexed in Azure AI Search, and retrieved at query time for LLM grounding.
| Stage | Description | Tools / Services | Compute |
|---|---|---|---|
| 1. User Query | Natural-language question about policies, guidelines, or data | Chat UI / API | No GPU |
| 2. MCP Client | Routes request to appropriate MCP server based on intent | MCP Client | CPU |
| 3. MCP Server | Executes tool calls: query_warehouse, query_lakehouse, or search SharePoint index | MCP fabric-lakehouse-warehouse MCP SharePoint Indexer |
CPU |
| 4. Embedding + Search | Documents chunked and embedded; vector similarity search retrieves relevant passages | Azure OpenAI text-embedding-3-large Azure AI Search |
Pay-per-token (embeddings) |
| 5. LLM Generation | Retrieved context + user query sent to LLM for grounded response | Azure OpenAI GPT-4o | Pay-per-token |
| 6. Response | Answer with source citations returned to user | Chat UI / API | No extra cost |
| Stage | Description | Tools / Services | Compute |
|---|---|---|---|
| 1. User Input | Clinical question, abstraction request, or data query | Chat UI / API | No GPU |
| 2. Router Node | Keyword heuristic + LLM fallback classifies intent (abstraction, qa, sql) | LangGraph Router Node Azure OpenAI |
1 LLM call (fallback only) |
| 3a. Abstraction Path | blob_lookup → pdf_extract → get_dx → compare | Azure Blob Azure OpenAI | Blob reads + LLM calls |
| 3b. QA Path | qa node — direct LLM answer with clinical context | Azure OpenAI | 1 LLM call |
| 3c. SQL Path | sql node — NL2SQL against Warehouse | Fabric Warehouse | Fabric capacity |
| 4. Finalize | Format and validate response from any path | LangGraph Finalize Node | CPU |
| 5. Supervisor | Quality gate — review, approve, or request retry | LangGraph Supervisor Node | 1 LLM call |
| Stage | Description | Tools / Services | Compute |
|---|---|---|---|
| 1. Input | Interactive: single MRN via chat Batch: CSV upload with MRN list |
LangGraph FastAPI endpoints | CPU |
| 2. Router | Classify intent and route to abstraction pipeline | LangGraph Router Node | Heuristic + LLM fallback |
| 3. Blob Lookup | Find clinical note PDF in Azure Blob by MRN | Azure Blob hp-notes container | No GPU |
| 4. PDF Extract | Extract text from clinical note PDF using pdfplumber | pdfplumber | CPU |
| 5. Get Dx | LLM extracts structured diagnosis data from note text | Azure OpenAI GPT-4o | 1–2 LLM calls |
| 6. Compare | Compare extracted data against reference (Warehouse / dx-data) | Fabric Warehouse | CPU |
| 7. Finalize | Format output row, validate fields | LangGraph Finalize Node | CPU |
| 8. Supervisor | Quality gate — approve or retry | LangGraph Supervisor Node | 1 LLM call |
| 9. Report (Batch) | Fan-in results, generate report, upload to Azure Blob | Azure Blob abstraction-reports | CPU |
| Stage | Description | Tools / Services | Compute |
|---|---|---|---|
| 1. Data Collection | Gather clinical notes + gold-standard labels from Fabric Lakehouse | Fabric Lakehouse / Warehouse | No GPU Fabric F-SKU |
| 2. Labeling | Subject-matter experts annotate examples (structured JSON labels) | Manual review / labeling tool | Human effort |
| 3. JSONL Prep | Spark notebook formats labeled data into OpenAI chat-format JSONL | Fabric Spark Notebook | No GPU CPU Spark cluster |
| 4. Upload | JSONL uploaded to Azure ML / Foundry Data + Indexes | Foundry Data assets | No GPU |
| 5. QLoRA Fine-tune | Supervised fine-tuning job. QLoRA on 100–500 examples per task | Foundry Fine-tuning Job | 4× A100 $32/hr, 2–4 hrs/run |
| 6. Deploy Endpoint | Managed online endpoint with scale-to-zero, OpenAI-compatible API | Foundry Managed Endpoint | 1× A100 $3.67/hr active, $0 idle |
| 7. Inference | Call fine-tuned model for extraction. JSON structured output | REST API (OpenAI-compat) | Same endpoint |