EnhancedSimilar

Clinical Documentation Structuring

Production-grade pipeline for ingesting clinical documents — scanned charts, EHR exports, wound photos, and therapy notes — and structuring them into coded fields aligned with MDS 3.0, PDPM, and CMS audit requirements. Combines OCR, clinical NER, taxonomy classification, and hybrid retrieval to turn unstructured bedside documentation into queryable, auditable data.

text

image

Production

1.2K runs

Deploy Recipe

Why This Matters

Nurses spend up to 40% of their time on documentation instead of patient care. Clinical data lives in free-text notes, scanned forms, and photos that are invisible to billing and compliance systems. This recipe bridges the gap — extracting structured clinical data from every modality so MDS coordinators, billers, and surveyors can work from a single source of truth.

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

# 1. Create namespace for the facility
namespace = client.namespaces.create(name="facility-clinical-docs")

# 2. Build collection with clinical extractors
collection = client.collections.create(
    namespace_id=namespace.id,
    name="patient-charts",
    extractors=[
        "pdf-extraction",       # OCR for scanned charts
        "text-embedding-v2",    # Semantic embeddings
        "image-captioning",     # Wound photos, imaging
    ],
)

# 3. Upload clinical documents
client.buckets.upload(
    collection_id=collection.id,
    url="s3://facility-ehr-export/patient-charts/"
)

# 4. Create MDS-aligned retriever
retriever = client.retrievers.create(
    namespace_id=namespace.id,
    name="mds-documentation",
    stages=[
        {"type": "hybrid_search", "vector_weight": 0.5, "bm25_weight": 0.5, "top_k": 50},
        {"type": "attribute_filter", "conditions": [
            {"field": "mds_section", "operator": "in", "value": ["G", "J", "K"]}
        ]},
        {"type": "rerank", "model": "colbert-v2", "top_k": 10}
    ]
)

# 5. Retrieve MDS-relevant documentation
results = client.retrievers.execute(
    retriever_id=retriever.id,
    query="functional mobility and ADL performance for Section G"
)

for r in results:
    print(f"[{r.metadata.get('mds_section')}] {r.content[:120]}")

Feature Extractors

PDF Text Extraction

Extract structured text and layout information from PDFs

645K runs

Image Captioning

Generate descriptive captions for images automatically

589K runs

Retriever Stages

attribute filter

Filter documents by metadata attribute values using boolean logic

filter

rerank

Rerank documents using cross-encoder models for accurate relevance

sort

Resources Used

Taxonomy

Clinical Documentation Structuring

Why This Matters

Feature Extractors

Retriever Stages

Resources Used

Use Cases Using This Recipe

SNF Documentation Intelligence

Related Recipes & Resources

Image Captioning

PDF Text Extraction

Image Captioning

Document Intelligence Search

Multimodal Hybrid Search Pipeline

Multimodal RAG Pipeline