Children's Hospital Los Angeles — Office of Enterprise Data
March 2026

Clinical LLM Platform — Architecture & Cost

Fabric + Azure AI Foundry/ML · Qwen3-32B Fine-tuning · 6 Clinical Extraction Use Cases
Model Selection
Text-only projects (5 of 6)
Qwen3-32B
  • Central Venous Catheters (Retrospective)
  • NICCU CVC Dashboard (QI)
  • Agitation Prediction (Retrospective)
  • eBikes / eScooters (Retrospective)
  • Dietician / Nutrition (Retrospective)
Multimodal project (1 of 6)
Qwen2.5-VL-32B or GPT-4o
  • Xray Object Interpretation (image + notes)
  • Start with GPT-4o (no GPU needed, zero-shot)
  • Fine-tune Qwen2.5-VL if accuracy insufficient
  • LLaVA-Med as fallback (bring-your-own-weights)
Architecture — Data Flow & Compute
Stage Microsoft Fabric Azure ML / Foundry Compute / Infra
1. Data Storage
Clinical notes & structured data
Fabric
Lakehouse / ADLS Gen2
Notes as files or Delta table rows. Structured data (MRNs, codes) in Warehouse.
No GPU
Fabric F-SKU capacity (existing). West US 3.
2. Cohort Query
Filter patients by coded criteria
Fabric
Warehouse SQL endpoint
Filter by dept, date range, procedure codes. Returns MRN list.
No GPU
CPU-only Fabric compute.
3. Training Data Prep
Format labeled examples for fine-tuning
Fabric
Spark Notebook
Extract notes + gold-standard labels. Format as JSONL (OpenAI chat format). Export to Blob.
Foundry
Upload JSONL to Data + Indexes
No GPU
Fabric Spark cluster (CPU). Egress ~$0.08/GB West US 3 → East US 2.
4. Fine-tuning Job
Supervised fine-tuning on clinical data
Foundry
Azure ML Fine-tuning Job
QLoRA SFT on 100–500 labeled examples per task. One-time or quarterly refresh. Training type: Regional (PHI).
GPU Required
Standard NDAMSv4_A100 (4× A100 80GB). $32/hr. Duration: 2–4 hrs/run. Spot option: ~$6/hr.
5. Model Storage
Fine-tuned checkpoint saved
Foundry
Azure Blob Storage (East US 2)
~20 GB adapter weights + base model reference.
Storage
~$0.40/month. Azure Blob LRS East US 2.
6. Inference Endpoint
Deploy fine-tuned model as REST API
Foundry
Managed Online Endpoint
Scale-to-zero when idle. OpenAI-compatible API. Private VNet for PHI. Regional deployment (East US 2).
GPU Required
Standard NCADSA100v4 (1× A100 80GB). $3.67/hr active. $0/hr idle (scale-to-zero). ~2–5 min cold start.
7. Batch Inference
Process notes at scale
Fabric
Spark Notebook
Loop MRN list, retrieve notes, call endpoint, collect structured JSON responses.
Foundry
Fine-tuned Qwen3-32B API
No Extra Cost
Fabric Spark (CPU). Network egress negligible (<$1/run at ~500MB).
8. Results & Reporting
Store outputs, serve dashboards
Fabric
Lakehouse → Power BI
Write extraction results to Delta table. QI dashboard (NICCU CVC). Research exports for retrospective projects.
No GPU
Power BI Premium / Fabric capacity. Existing licensing.
Cost Estimate (Monthly)
Component Resource When Active Unit Cost Monthly Estimate
Fabric platform Existing F-SKU capacity Always Existing license $0 incremental
Inference endpoint NCADSA100v4 (1× A100 80GB) Business hours only (~10 hrs/day, weekdays) $3.67/hr ~$550
Fine-tuning job NDAMSv4_A100 (4× A100 80GB) Once/quarter (~3 hrs/run) $32/hr (or $6/hr spot) ~$96 (or ~$18 spot)
Model storage Azure Blob LRS East US 2 Always $0.018/GB/month ~$0.40
Training data storage Azure Blob (JSONL files ~1 GB) Always $0.018/GB/month ~$0.02
Cross-region data transfer West US 3 → East US 2 egress Per batch run (~500 MB) $0.08/GB <$1
GPT-4o (Xray project) Serverless — pay per token On demand ~$5/1M tokens ~$10–50 (volume dep.)
Total estimated monthly cost ~$560–660 / month