This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Current Version: 0.1.5 Package Name: hccinfhir Description: A Python library for calculating HCC (Hierarchical Condition Category) risk adjustment scores from healthcare claims data License: Apache 2.0 Python Requirements: 3.9+
HCCInFHIR processes healthcare data to calculate Medicare risk adjustment scores used for payment calculations. It supports multiple input formats:
The library implements the official CMS-HCC risk adjustment methodology, including:
hatch shell # Activate virtual environment
pip install -e . # Install package in development mode
pytest tests/* # Run all tests
pytest tests/test_filter.py # Run specific test file
hatch build # Build package
hatch publish # Publish to PyPI (maintainers only)
pyproject.toml dependenciespydantic >= 2.10.3pytesthccinfhir.py)The main processor class with three execution methods:
from hccinfhir import HCCInFHIR, Demographics
# Initialize processor
processor = HCCInFHIR(
filter_claims=True, # Apply CMS filtering rules
model_name="CMS-HCC Model V28", # HCC model to use
proc_filtering_filename="ra_eligible_cpt_hcpcs_2026.csv", # CPT/HCPCS filtering
dx_cc_mapping_filename="ra_dx_to_cc_2026.csv" # Diagnosis mapping
)
Method 1: run() - Process FHIR EOB Resources
# Input: List of FHIR ExplanationOfBenefit resources
eob_list = [{"resourceType": "ExplanationOfBenefit", ...}]
demographics = Demographics(age=67, sex="F")
result = processor.run(eob_list, demographics)
Method 2: run_from_service_data() - Process Service-Level Data
# Input: Standardized ServiceLevelData objects
service_data = [ServiceLevelData(...)]
demographics = Demographics(age=67, sex="F")
result = processor.run_from_service_data(service_data, demographics)
Method 3: calculate_from_diagnosis() - Direct Diagnosis Processing
# Input: List of diagnosis codes
diagnosis_codes = ["E11.9", "I10", "N18.3"] # ICD-10 codes
demographics = Demographics(age=67, sex="F")
result = processor.calculate_from_diagnosis(diagnosis_codes, demographics)
All three methods support prefix_override parameter for cases where demographic auto-detection is incorrect:
# ESRD patient with incorrect orec/crec codes (common data quality issue)
demographics = Demographics(age=65, sex="F", orec="0", crec="0")
diagnosis_codes = ["N18.6", "E11.22"]
# Force ESRD dialysis coefficients despite orec/crec being wrong
result = processor.calculate_from_diagnosis(
diagnosis_codes,
demographics,
prefix_override='DI_' # ESRD Dialysis prefix
)
When to use prefix_override:
Common prefix values: See “Coefficient Prefix Reference” section below for complete list.
extractor.py, extractor_fhir.py, extractor_837.py)Extract from FHIR:
from hccinfhir.extractor import extract_sld, extract_sld_list
# Single EOB
eob = {"resourceType": "ExplanationOfBenefit", ...}
service_data = extract_sld(eob)
# Multiple EOBs
eob_list = [eob1, eob2, eob3]
service_data_list = extract_sld_list(eob_list)
Extract from X12 837:
from hccinfhir.extractor_837 import extract_sld_from_837
# X12 837 claim text
x12_text = "ISA*00* *00* *ZZ*..."
service_data = extract_sld_from_837(x12_text)
Extract from X12 834 (Enrollment/Demographics):
from hccinfhir.extractor_834 import (
extract_enrollment_834,
enrollment_to_demographics,
is_losing_medicaid,
medicaid_status_summary
)
# X12 834 enrollment file
x12_text = "ISA*00* *00* *ZZ*..."
enrollments = extract_enrollment_834(x12_text)
# Get member demographics for risk calculation
for enrollment in enrollments:
demographics = enrollment_to_demographics(enrollment)
# Check if member is losing Medicaid coverage
if is_losing_medicaid(enrollment, within_days=90):
print(f"Alert: {enrollment.member_id} losing Medicaid soon!")
# Get comprehensive Medicaid status
status = medicaid_status_summary(enrollment)
print(f"Dual Status: {status['dual_status']}")
print(f"Full Benefit Dual: {status['is_full_benefit_dual']}")
filter.py)from hccinfhir.filter import apply_filter
from hccinfhir.utils import load_proc_filtering
# Load CPT/HCPCS codes for filtering
professional_cpt = load_proc_filtering("ra_eligible_cpt_hcpcs_2026.csv")
# Apply CMS filtering rules
filtered_data = apply_filter(service_data_list, professional_cpt)
model_calculate.py)from hccinfhir.model_calculate import calculate_raf
result = calculate_raf(
diagnosis_codes=["E11.9", "I10"],
model_name="CMS-HCC Model V28",
age=67,
sex="F",
dual_elgbl_cd="N",
orec="0",
# ... other demographics
)
model_demographics.py)from hccinfhir.model_demographics import get_demographic_coefficients
coeffs = get_demographic_coefficients(
age=67, sex="F", dual_elgbl_cd="N",
model_name="CMS-HCC Model V28"
)
model_dx_to_cc.py)from hccinfhir.model_dx_to_cc import get_dx_to_cc_list
# Map diagnosis codes to HCCs
hcc_list = get_dx_to_cc_list(
diagnosis_codes=["E11.9", "I10"],
dx_to_cc_mapping=dx_mapping_data
)
model_hierarchies.py)from hccinfhir.model_hierarchies import apply_hierarchies
# Apply HCC hierarchical rules
final_hccs = apply_hierarchies(
hcc_list=["HCC18", "HCC85"],
model_name="CMS-HCC Model V28"
)
samples.py)from hccinfhir import get_eob_sample, get_eob_sample_list, get_837_sample, get_834_sample
# Get FHIR EOB samples
eob = get_eob_sample(1) # Individual sample (cases 1, 2, or 3)
eob_list = get_eob_sample_list(limit=200) # Up to 200 samples
# Get X12 837 samples
x12_text = get_837_sample(0) # Professional claim (cases 0-12)
# Get X12 834 sample
x12_834 = get_834_sample(1) # Enrollment data (case 1)
# Process sample data
processor = HCCInFHIR()
demographics = Demographics(age=67, sex="F")
result = processor.run([eob], demographics) # Note: wrap single EOB in list
utils.py)from hccinfhir.utils import load_proc_filtering, load_dx_to_cc_mapping
# Load filtering data
cpt_codes = load_proc_filtering("ra_eligible_cpt_hcpcs_2026.csv")
# Load diagnosis mapping
dx_mapping = load_dx_to_cc_mapping("ra_dx_to_cc_2026.csv")
# Complete workflow from FHIR to RAF score
processor = HCCInFHIR(model_name="CMS-HCC Model V28")
eob_list = get_eob_sample_list(limit=200)
demographics = Demographics(age=67, sex="F", dual_elgbl_cd="00")
result = processor.run(eob_list, demographics)
print(f"RAF Score: {result.risk_score}")
print(f"HCCs: {result.hcc_list}")
# Manual control over each step
from hccinfhir.extractor import extract_sld_list
from hccinfhir.filter import apply_filter
from hccinfhir.model_calculate import calculate_raf
# Step 1: Extract
service_data = extract_sld_list(eob_list)
# Step 2: Filter
filtered_data = apply_filter(service_data, professional_cpt)
# Step 3: Calculate
diagnosis_codes = list({code for sld in filtered_data for code in sld.claim_diagnosis_codes})
result = calculate_raf(diagnosis_codes, "CMS-HCC Model V28", age=67, sex="F")
# Process multiple patients
patients = [
{"eobs": eob_list_1, "demographics": Demographics(age=65, sex="M")},
{"eobs": eob_list_2, "demographics": Demographics(age=72, sex="F")},
]
processor = HCCInFHIR()
results = []
for patient in patients:
result = processor.run(patient["eobs"], patient["demographics"])
results.append({
"patient_id": patient.get("id"),
"risk_score": result.risk_score,
"hccs": result.hcc_list
})
The 834 parser (extractor_834.py) extracts enrollment and demographic data from X12 834 Benefit Enrollment transactions, with a specific focus on California DHCS Medi-Cal dual eligibility status. This is critical for risk adjustment because dual-eligible beneficiaries receive different coefficient prefixes, resulting in significant RAF score differences.
Impact Example:
# 72-year-old female with diabetes (E11.9 → HCC19)
# Non-Dual (Medi-Cal only or Medicare only)
demographics = Demographics(age=72, sex='F', dual_elgbl_cd='00')
# Uses prefix: CNA_ (Community, Non-Dual, Aged)
# RAF Score: ~1.2
# Full Benefit Dual (QMB Plus, SLMB Plus)
demographics = Demographics(age=72, sex='F', dual_elgbl_cd='02')
# Uses prefix: CFA_ (Community, Full Benefit Dual, Aged)
# RAF Score: ~1.8 (50% higher!)
# Partial Benefit Dual (QMB Only, SLMB Only, QI)
demographics = Demographics(age=72, sex='F', dual_elgbl_cd='01')
# Uses prefix: CPA_ (Community, Partial Benefit Dual, Aged)
# RAF Score: ~1.4
from hccinfhir.extractor_834 import extract_enrollment_834, enrollment_to_demographics
# Load 834 file
with open('dhcs_834_file.txt', 'r') as f:
content = f.read()
# Parse enrollments
enrollments = extract_enrollment_834(content)
# Process each member
for enrollment in enrollments:
print(f"Member: {enrollment.member_id}")
print(f"MBI: {enrollment.mbi}")
print(f"Medicaid ID: {enrollment.medicaid_id}")
print(f"Dual Status: {enrollment.dual_elgbl_cd}")
print(f"Full Benefit Dual: {enrollment.is_full_benefit_dual}")
print(f"Partial Benefit Dual: {enrollment.is_partial_benefit_dual}")
# Convert to Demographics for RAF calculation
demographics = enrollment_to_demographics(enrollment)
Use Case: Detect when members will lose Medicaid coverage, causing dual-eligible status to end and RAF scores to drop.
from hccinfhir.extractor_834 import (
extract_enrollment_834,
is_losing_medicaid,
is_medicaid_terminated,
medicaid_status_summary
)
enrollments = extract_enrollment_834(content)
for enrollment in enrollments:
# Check if losing Medicaid within 90 days
if is_losing_medicaid(enrollment, within_days=90):
print(f"⚠️ ALERT: {enrollment.member_id} losing Medicaid!")
print(f" Coverage ends: {enrollment.coverage_end_date}")
print(f" Current dual status: {enrollment.dual_elgbl_cd}")
print(f" Expected RAF impact: -30% to -50%")
# Check if Medicaid is being terminated
if is_medicaid_terminated(enrollment):
print(f"⚠️ TERMINATION: {enrollment.member_id} Medicaid canceled")
# Get comprehensive status summary
status = medicaid_status_summary(enrollment)
print(f"Status Summary: {status}")
# Returns:
# {
# 'member_id': 'MBR001',
# 'has_medicaid': True,
# 'has_medicare': True,
# 'dual_status': '02',
# 'is_full_benefit_dual': True,
# 'is_partial_benefit_dual': False,
# 'coverage_end_date': '2025-12-31',
# 'is_termination': False,
# 'losing_medicaid_30d': False,
# 'losing_medicaid_60d': False,
# 'losing_medicaid_90d': False
# }
The parser is optimized for California DHCS 834 files with these state-specific mappings:
# Full Benefit Dual codes
'4N': '02' # QMB Plus - Aged
'4P': '02' # QMB Plus - Disabled
'5B': '04' # SLMB Plus - Aged
'5D': '04' # SLMB Plus - Disabled
# Partial Benefit Dual codes
'4M': '01' # QMB Only - Aged
'4O': '01' # QMB Only - Disabled
'5A': '03' # SLMB Only - Aged
'5C': '03' # SLMB Only - Disabled
'5E': '06' # QI - Aged
'5F': '06' # QI - Disabled
'QMB' / 'QMBONLY': '01' # Partial Benefit
'QMBPLUS' / 'QMB+': '02' # Full Benefit
'SLMB' / 'SLMBONLY': '03' # Partial Benefit
'SLMBPLUS' / 'SLMB+': '04' # Full Benefit
'QI' / 'QI1': '06' # Partial Benefit
'QDWI': '05' # Partial Benefit
Loop 2000 - Member Level
INS - Member Level Detail
INS03 - Maintenance Type (001=Change, 021=Add, 024=Cancel)
REF - Reference Identifiers
REF*0F - Subscriber Number
REF*6P - Medicare Beneficiary Identifier (MBI)
REF*1D - Medicaid ID
REF*AB - California Medi-Cal Aid Code
REF*ABB - Medicare Status Code (QMB, SLMB, etc.)
NM1*IL - Member Name & ID
DMG - Demographics (DOB, Sex) ***CRITICAL***
DTP - Date Time Periods
DTP*348 - Coverage Begin Date
DTP*349 - Coverage End Date ***CRITICAL for loss detection***
DTP*338 - Medicare Part A/B Effective Date
HD - Health Coverage ***CRITICAL for dual status***
Detects Medicare, Medicaid, D-SNP keywords
The parser uses intelligent logic to determine dual eligibility:
from hccinfhir import HCCInFHIR
from hccinfhir.extractor_834 import extract_enrollment_834, enrollment_to_demographics
from hccinfhir.extractor_837 import extract_sld_837
# Parse enrollment data
enrollments_834 = extract_enrollment_834(content_834)
# Parse claims data
service_data_837 = extract_sld_837(content_837)
# Match member and calculate RAF
processor = HCCInFHIR()
for enrollment in enrollments_834:
# Get demographics from 834
demographics = enrollment_to_demographics(enrollment)
# Filter claims for this member
member_claims = [sld for sld in service_data_837 if sld.patient_id == enrollment.member_id]
# Calculate RAF score
result = processor.run_from_service_data(member_claims, demographics)
print(f"Member: {enrollment.member_id}")
print(f"Dual Status: {enrollment.dual_elgbl_cd}")
print(f"RAF Score: {result.risk_score}")
Sample 834 file available at: src/hccinfhir/sample_files/sample_834_01.txt
Includes 5 test scenarios:
from hccinfhir import get_834_sample
# Get sample 834
content_834 = get_834_sample(1)
enrollments = extract_enrollment_834(content_834)
The 820 sits downstream of risk adjustment — it carries the capitation payment that results from RAF scores, not the inputs that produce them:
837 claims / FHIR EOB ──┐
├──▶ HCC mapping ──▶ RAF score ──▶ capitation ──▶ 820
834 enrollment ──┘ × benchmark
Its value is indirect: reconciliation, retroactive adjustment detection, and dual eligibility verification.
Verify that actual capitation (820.payment_amount) matches your calculated
RAF score × CMS benchmark rate. A variance usually means CMS used a different
RAF (prior-year sweep, mid-year correction) or the member’s demographic prefix
changed.
from hccinfhir import get_820_sample
from hccinfhir.extractor_820 import extract_payment_820
calculated_rafs = {"TESTMBR000000001": 1.42, "TESTMBR000000002": 0.98}
cms_benchmark = 1200.00
payment = extract_payment_820(get_820_sample(1))[0]
for member in payment.members:
raf = calculated_rafs.get(member.member_id)
if not raf:
continue
for entry in member.remittance_entries:
if entry.payment_amount is None:
continue
variance = entry.payment_amount - raf * cms_benchmark
if abs(variance) > 10:
print(f"{member.member_id}: paid={entry.payment_amount:.2f} "
f"expected={raf * cms_benchmark:.2f} variance={variance:+.2f}")
Members with ADX segments had prior-period corrections applied. adjustment_reason
"53" = Prior Period Adjustment; "72" = Retroactive Rate Change. These flag
members whose risk scores may have been restated by CMS.
payment = extract_payment_820(get_820_sample(2))[0]
for member in payment.members:
for e in member.remittance_entries:
if e.adjustment_amount is not None:
print(f"{member.member_id} | period={e.coverage_period_start} "
f"adj={e.adjustment_amount:+.2f} reason={e.adjustment_reason}")
if e.adjustment_reason == "53":
print(f" → review HCC submissions for this member")
The aid_code and plan_type fields in each RemittanceEntry show which
capitation rate was applied. These map directly to dual_elgbl_cd from the 834
parser. A mismatch means the member was paid under the wrong risk category.
plan_type "1" = primary/medical plan_type "2" = pharmacy/state-only
aid codes "1H","60","10","16","17" → Dual "M1","M3" → Medi-Cal Only
from hccinfhir.extractor_834 import extract_enrollment_834
from hccinfhir import get_820_sample, get_834_sample
from hccinfhir.extractor_820 import extract_payment_820
dual_map = {e.member_id: e.dual_elgbl_cd
for e in extract_enrollment_834(get_834_sample(1))}
DUAL_AID_CODES = {"1H", "60", "10", "16", "17", "6H", "20"}
payment = extract_payment_820(get_820_sample(1))[0]
for member in payment.members:
dual_cd = dual_map.get(member.member_id)
if dual_cd is None:
continue
for entry in member.remittance_entries:
paid_as_dual = entry.aid_code in DUAL_AID_CODES
enrolled_as_dual = dual_cd not in ("00", "NA", None)
if paid_as_dual != enrolled_as_dual:
print(f"MISMATCH {member.member_id}: aid={entry.aid_code} "
f"but dual_elgbl_cd={dual_cd}")
from hccinfhir import get_820_sample
from hccinfhir.extractor_820 import extract_payment_820
payments = extract_payment_820(get_820_sample(1)) # one PaymentData per ST*820
payment = payments[0]
print(payment.payer_name, "→", payment.payee_name)
print(f"Total: ${payment.total_amount:,.2f} EFT: {payment.check_number}")
for member in payment.members:
for entry in member.remittance_entries:
print(f" {member.member_id} {entry.coverage_period_start}..{entry.coverage_period_end} "
f"${entry.payment_amount:,.2f} aid={entry.aid_code}/{entry.plan_type} "
f"{entry.description or ''}")
PaymentData — one per 820 transaction
source, report_date, total_amount, payment_date, check_numberpayer_name, payee_name (+ address fields)members → list of PaymentDetailPaymentDetail — one per ENT loop
entity_number, member_id, last_name, first_name, middle_nameremittance_entries → list of RemittanceEntryRemittanceEntry — one per RMR/DTM set
payment_amount — net; negative = recoupmentoriginal_amount — pre-adjustment amount when presentrate_code — REF*18 (e.g. "957" = PACE)aid_code, plan_type, description — from REF*ZZcoverage_period_start, coverage_period_end — YYYY-MM-DDadjustment_amount, adjustment_reason — from ADXFive PHI-masked samples from California DHCS PACE capitation remittances:
| Sample | Members | Total | Scenario |
|---|---|---|---|
| 1 | 12 | $102,139 | State-only pharmacy, single period |
| 2 | 13 | $91,978 | State-only with ADX prior-period adjustments |
| 3 | 93 | $697,086 | Primary capitation, dual + Medi-Cal only mix |
| 4 | 10 | $80,865 | Primary capitation, single period |
| 5 | 81 | $499,188 | Primary capitation with retroactive corrections |
from hccinfhir import get_820_sample
from hccinfhir.extractor_820 import extract_payment_820
payment = extract_payment_820(get_820_sample(2))[0] # sample with ADX adjustments
This is a Python library for extracting and processing healthcare data to calculate HCC (Hierarchical Condition Category) risk adjustment scores. The architecture follows a modular pipeline approach:
hccinfhir.py)run(): Process FHIR EOB resourcesrun_from_service_data(): Process standardized service datacalculate_from_diagnosis(): Direct diagnosis code processingextractor.py): Unified interface for data extractionextractor_fhir.py): Processes FHIR resources using Pydantic modelsextractor_837.py): Parses X12 837 claim datadatamodels.py)model_*.py): Implement CMS HCC calculation logic
model_calculate.py: Main RAF calculation orchestratormodel_demographics.py: Demographics processingmodel_dx_to_cc.py: Diagnosis to condition category mappingmodel_hierarchies.py: HCC hierarchical rulesmodel_interactions.py: Interaction calculationsmodel_coefficients.py: Risk score coefficientsfilter.py)Located in src/hccinfhir/data/:
ra_dx_to_cc_*.csv - ICD-10 to condition category mappingra_hierarchies_*.csv - HCC hierarchical relationshipsra_coefficients_*.csv - Risk score coefficientsra_eligible_cpt_hcpcs_*.csv - Eligible procedure codeshcc_is_chronic.csv - Chronic condition flagssrc/hccinfhir/samples/get_eob_sample(), get_837_sample(), etc.ModelName literal type in datamodels.pysrc/hccinfhir/data/get_eob_sample(), get_837_sample()Demographics modelpytest tests/test_*.py filespyproject.toml configurationThe library uses demographic prefixes to select appropriate risk adjustment coefficients. Prefixes are automatically derived from patient demographics (age, sex, orec, crec, dual status, etc.), but can be manually overridden using the prefix_override parameter.
CNA_ - Community, Non-Dual, Aged (65+)CND_ - Community, Non-Dual, Disabled (<65)CFA_ - Community, Full Benefit Dual, Aged (65+)CFD_ - Community, Full Benefit Dual, Disabled (<65)CPA_ - Community, Partial Benefit Dual, Aged (65+)CPD_ - Community, Partial Benefit Dual, Disabled (<65)INS_ - Long-Term Institutionalized (nursing home >90 days)NE_ - New Enrollee (standard)SNPNE_ - Special Needs Plan New EnrolleeDI_ - Dialysis (standard)DNE_ - Dialysis New EnrolleeGI_ - Graft, InstitutionalizedGNE_ - Graft, New EnrolleeGFPA_ - Graft, Full Benefit Dual, Aged (65+)GFPN_ - Graft, Full Benefit Dual, Non-Aged (<65)GNPA_ - Graft, Non-Dual, Aged (65+)GNPN_ - Graft, Non-Dual, Non-Aged (<65)TRANSPLANT_KIDNEY_ONLY_1M - 1 month post-transplantTRANSPLANT_KIDNEY_ONLY_2M - 2 months post-transplantTRANSPLANT_KIDNEY_ONLY_3M - 3 months post-transplantRx_CE_LowAged_ - Community, Low Income, Aged (65+)Rx_CE_LowNoAged_ - Community, Low Income, Non-Aged (<65)Rx_CE_NoLowAged_ - Community, Not Low Income, Aged (65+)Rx_CE_NoLowNoAged_ - Community, Not Low Income, Non-Aged (<65)Rx_CE_LTI_ - Community Enrollee, Long-Term InstitutionalizedRx_NE_Lo_ - New Enrollee, Low IncomeRx_NE_NoLo_ - New Enrollee, Not Low IncomeRx_NE_LTI_ - New Enrollee, Long-Term Institutionalizedfrom hccinfhir import HCCInFHIR, Demographics
# Example 1: ESRD patient with bad orec/crec data
processor = HCCInFHIR(model_name="CMS-HCC ESRD Model V24")
demographics = Demographics(age=65, sex="F", orec="0", crec="0") # Wrong codes
diagnosis_codes = ["N18.6", "E11.22", "I12.0"]
# Override to force ESRD dialysis coefficients
result = processor.calculate_from_diagnosis(
diagnosis_codes,
demographics,
prefix_override='DI_'
)
# Example 2: Institutionalized patient not properly flagged
processor = HCCInFHIR(model_name="CMS-HCC Model V28")
demographics = Demographics(age=78, sex="M") # Should be LTI
diagnosis_codes = ["F03.90", "I48.91", "N18.4"]
# Override to use institutionalized coefficients
result = processor.calculate_from_diagnosis(
diagnosis_codes,
demographics,
prefix_override='INS_'
)
The library automatically derives the prefix from:
orec ∈ {‘2’, ‘3’, ‘6’} or crec ∈ {‘2’, ‘3’}dual_elgbl_cd ∈ {‘02’, ‘04’, ‘08’}), Partial Benefit Dual ({‘01’, ‘03’, ‘05’, ‘06’}), or Non-DualCommon Data Quality Issues:
graft_months may be missing or incorrectWhen these issues occur, use prefix_override to ensure correct coefficient selection.