In Part 1, we revealed that average healthcare data quality scores just 36/100 (F grade). We quantified the costs: 10x higher AI expenses, clinical risks from missing diagnoses, and regulatory exposure. Now, here’s how we systematically fixed it.
The Solution: Systematic, Measured Improvement
After applying our 13-step data refinement process across 3 main layers, we re-measured the same patient records.
Average post-refinement score: 80/100 (B grade – Good)
Improvement: 44 points = 122% increase
What Changed – Layer by Layer
Layer 1: Raw Data Collection
- Connected to ALL major networks (Patient Access API, TEFCA, CMS)
- Integrated ALL source types (EMR, payer, pharmacy, lab, imaging)
- Result: Multi-source completeness improved from 25/100 → 75/100
Layer 2: Derived Data Standardization
- Mapped custom codes to standards (LOINC, RxNorm, CVX, SNOMED)
- Applied clinical logic (active vs. completed medications)
- Deduplicated across sources using medical terminologies
- Result: Conformity improved from 38/100 → 82/100
Layer 3: Composition for AI Optimization
- Grouped related information (brand/generic drugs)
- Created timeline views optimized for AI processing
- Added clinical context (prescriber info, encounter types)
- Result: Availability improved from 45/100 → 85/100
Real Patient Example: Before and After
Patient: 43-year-old with chronic conditions
Before (36/100 – F grade):
The Data:
- 700+ medication records (dispenses, prescriptions, patient statements)
- 21 medications marked “active” (including medications from 2021—five years ago)
- 4,462 lab and vital readings
- Different coding systems (LOINC, custom Epic codes, no codes at all)
- Same labs with different names and units (9000mg vs 9g)
- Missing RxNorm codes, using custom Cerner systems
- Brand and generic versions of same drug listed separately
- Labs miscategorized as vitals and vice versa
The Scores:
- Availability: 45/100 (D) – Missing critical resources
- Accuracy: 52/100 (F) – Invalid values and formats
- Conformity: 38/100 (F) – Non-standard codes
- Plausibility: 48/100 (F) – Implausible timelines
- Completeness: 25/100 (F) – Only 19% longitudinal coverage, single-source
Critical Issue: Crohn’s disease and Lupus diagnoses exist in Patient Access API network but completely absent from TEFCA.
After (80/100 – B grade):
The Data:
- 29 clinically relevant medications
- 3 actually active medications (correctly identified using clinical logic)
- Brand and generic drugs properly grouped (Jakafi + Ruxolitinib)
- All codes standardized to RxNorm
- Prescriber information filled in from national provider directory
- 24 key lab findings + 6 vitals (older data accessible if needed)
- All standardized to LOINC codes
- Results and reference ranges properly extracted
- Organized in timeline view optimized for AI processing
The Scores:
- Availability: 85/100 (B) – All critical resources present
- Accuracy: 88/100 (B) – Clinical validation applied
- Conformity: 82/100 (B) – Standardized terminologies
- Plausibility: 78/100 (C) – Temporal and clinical validation
- Completeness: 72/100 (C) – Connected all networks, multi-source validation
Critical Fix: Crohn’s disease diagnosis is now visible (retrieved from Patient Access API and connected across networks).
The Clinical Impact: What Each Dimension Prevents
Scenario 1: The Missing Diagnosis (Completeness Issue)
Without Refinement:
- AI sees: Abdominal pain complaint
- AI recommends: Ibuprofen (Advil) for pain relief
- Hidden risk: Patient has Crohn’s disease (only in disconnected network)
- Outcome: NSAIDs can cause serious complications for Crohn’s patients
With b.well Refinement:
- AI sees: Abdominal pain + Crohn’s disease diagnosis
- AI recommends: Acetaminophen (Tylenol) instead, avoids NSAIDs
- Outcome: Safe, appropriate recommendation
PIQI Dimension Improved: Multi-Source Completeness (25/100 → 75/100)
Scenario 2: The Medication Time Bomb (Availability + Accuracy Issue)
Without Refinement:
- AI sees: 21 “active” medications (including 5-year-old prescriptions)
- AI checks: Drug interactions across all 21
- Risk: False positive interactions, alert fatigue
- Worse risk: Recommends both brand (Jakafi) and generic (Ruxolitinib) versions = double dosing
With b.well Refinement:
- AI sees: 3 actually active medications (validated with clinical logic)
- AI checks: Interactions only for current medications
- Outcome: Accurate interaction checking, brand/generic properly grouped
PIQI Dimension Improved: Availability + Accuracy (45/100 → 85/100)
Scenario 3: The Token Cost Explosion (Temporal Density + Currency Issue)
Without Refinement:
- AI processes: 4,462 lab results in context window
- Token cost: ~$50 per query (at GPT-4 pricing)
- Response time: 30+ seconds
- Risk: Context window exhaustion, important findings buried in noise
With b.well Refinement:
- AI processes: 24 key lab findings (older data accessible if needed)
- Token cost: ~$5 per query
- Response time: 3 seconds
- Outcome: 10x cost reduction, 10x speed improvement, better focus
PIQI Dimension Improved: Temporal Density + Currency (25/100 → 72/100)
The Differentiator: De-Duplication Across Sources
Here’s what most companies miss: The same clinical event appears differently across every data source.
Example: COVID-19 Vaccination
From Walgreens (Pharmacy):
Code: CVX 208 (COVID-19 vaccine)
Date: 2024-01-15
Lot: ABC123
From Cigna (Payer):
Code: CPT 91300 (COVID immunization admin)
Date: 2024-01-15
Claim: Walgreens Pharmacy
From One Medical (EMR):
Code: Custom Epic code “COVID_VAX_2024”
Date: 2024-01-15
Note: “Patient received Pfizer booster”
Without Proper De-Duplication:
- AI sees: 3 different COVID vaccinations on the same day
- AI recommends: “You may be due for your COVID booster.”
- Patient confusion: “But I just got one!”
With b.well’s Medical Terminology-Based De-Duplication:
- AI sees: 1 COVID vaccination (cross-validated across 3 sources)
- AI recommends: “Your COVID vaccination is current.”
- Added benefit: Lot number from pharmacy + clinical note from EMR = complete record
PIQI Dimension Improved: Multi-Source Completeness + Conformity
This is why conformity to medical terminologies (LOINC, RxNorm, CVX, SNOMED) is critical. It’s not just about standards compliance—it’s the foundation for intelligent de-duplication across sources.