Insights from Arun Raghavan, VP of Data Engineering
The Question Nobody’s Asking
Your organization just invested millions in AI infrastructure. Your team spent months building clinical decision support systems. Your models are state-of-the-art.
But here’s the question that should come first: What’s your data quality score? Not “is your data good” or “have you cleaned your data.” What’s the actual, measured, quantifiable score? If you don’t know the answer, you’re building on an unknown foundation. And in healthcare, that’s a significant risk.
The Measurement Problem
Most healthcare organizations don’t systematically measure data quality. When they do, they often use informal assessments:
- “Our data team spot-checks records.”
- “We have FHIR Profiles or data validation rules.”
- “Our vendor says the data is clean.”
These approaches don’t answer the fundamental question: How good is good enough?
Our Approach: Rigorous, Transparent, Reproducible
We conducted a systematic analysis of healthcare data quality at population scale using the PIQI (Patient Information Quality Improvement) framework – an industry-standard assessment methodology.
Our Methodology:
- Millions of patient records across multiple health systems
- Mix of chronic disease patients, healthy patients, and high utilizers
- Data from all major sources: EMR, payer, pharmacy, labs, imaging
- Multiple data networks: Patient Access API, TEFCA, CMS
Framework:
- PIQI (Patient Information Quality Improvement) – industry-standard
- Extended with “Completeness” dimension for longitudinal, multi-source data
- 13 specific dimensions across 5 categories
- Weighted scoring (0-100 scale) with letter grades (A-F)
The Baseline: What We Found
Average raw data quality score: 36/100 (F grade – Critical). This wasn’t surprising. Healthcare data is inherently messy because of multiple sources with different standards, inconsistent coding practices, historical data from legacy systems, and disconnected networks with incomplete views.
Understanding the PIQI Framework
PIQI evaluates data across 4 categories (we extended the standard 4-category framework)
1. Availability (20% of quality score) – Is usable information present?
Dimensions:
- Missing: Expected elements are absent (e.g., no Condition resources)
- Unpopulated: Attributes exist but are empty (e.g., blank patient name)
- Incomplete: Inadequate information (e.g., code without code system)
Our baseline finding: 45/100 (D grade)
- 40% of expected resources missing
- 35% of fields unpopulated
- 25% of elements incomplete
Real example: Patient with documented Crohn’s disease and Lupus in Patient Access API network – completely absent from TEFCA data. If your system only connects to TEFCA, these critical diagnoses are invisible.
Consumer impact: AI makes recommendations without critical context. Clinical decision support misses important contraindications.
2. Accuracy (20% of quality score) – Is the data inherently valid?
Dimensions:
- Invalid Format: Improperly formatted data (e.g., date “2024-13-45”)
- Invalid Value: Values outside expected ranges (e.g., heart rate 500 bpm)
- Invalid Grouping: Incompatible attribute combinations
Our baseline finding: 52/100 (F grade)
Consumer impact: Invalid data causes AI to process nonsense as if it were real, leading to unreliable outputs.
3. Conformity (15% of quality score) – Does coded information conform to standards?
Dimensions:
- Invalid Member: Code doesn’t exist in the specified system
- Incompatible: Wrong code system used (e.g., ICD-10 where SNOMED expected)
- Obsolete: Deprecated/inactive codes
Our baseline finding: 38/100 (F grade)
- 40% of codes don’t exist in the specified standard terminologies
- 35% wrong code systems used (custom Epic codes instead of LOINC)
- 25% deprecated/obsolete codes still in use
Real example: Same lab test appears as:
- Custom Epic code “LAB_GLUCOSE_2024”
- LOINC code 2345-7 (correct)
- No code at all (just text “glucose”)
Consumer impact: Impossible to deduplicate across sources. Analytics can’t aggregate properly. Interoperability breaks down.
4. Plausibility (15% of quality score) – Does the data make sense?
Dimensions:
- Temporally Implausible: Timeline doesn’t make sense
- Clinically Implausible: Values outside reasonable ranges
- Situationally Implausible: Conflicting information
Our baseline finding: 48/100 (F grade)
Consumer impact: Implausible data indicates systemic issues that AI will amplify rather than correct.
The Cost: What Low Quality Data Actually Means
Poor data quality creates significant operational, clinical, and regulatory challenges:
Operational Costs
AI Token Costs:
- Processing 4,462 lab results per patient vs. 24 key findings
- ~$50 per AI query vs. ~$5 (10x difference)
- Poor user experience from slow response times (30+ seconds vs. 3 seconds)
Data Operations:
- Constant manual cleanup and data quality tickets
- Integration failures from non-standard codes
- Excessive storage costs from duplicate records
Clinical Risk
Missing Diagnosis Scenario:
- AI recommends NSAID for abdominal pain
- Patient has Crohn’s disease (in disconnected network)
- NSAID causes serious complications
- Significant liability exposure
Medication Error Scenario:
- 21 medications marked “active” (only 3 actually are)
- AI checks interactions across all 21
- False positive alerts = alert fatigue
- Missed real interaction = patient harm
Regulatory Exposure
HIPAA Violations:
- Processing 4,462 lab results when only 24 are needed
- Data minimization principle violated
Audit Failures:
- Can’t explain AI decisions based on poor-quality data
- Compliance failures in CMS quality reporting
This is the measurable cost of poor data quality.
In Part 2, we’ll reveal how we transformed F-grade data into B-grade data—achieving a 122% improvement—and what that means for AI performance, costs, and patient safety.