Mixpeek Logo
    EnhancedSimilar

    Clinical Documentation Structuring

    Production-grade pipeline for ingesting clinical documents — scanned charts, EHR exports, wound photos, and therapy notes — and structuring them into coded fields aligned with MDS 3.0, PDPM, and CMS audit requirements. Combines OCR, clinical NER, taxonomy classification, and hybrid retrieval to turn unstructured bedside documentation into queryable, auditable data.

    text
    image
    Production
    1.2K runs
    Deploy Recipe

    Why This Matters

    Nurses spend up to 40% of their time on documentation instead of patient care. Clinical data lives in free-text notes, scanned forms, and photos that are invisible to billing and compliance systems. This recipe bridges the gap — extracting structured clinical data from every modality so MDS coordinators, billers, and surveyors can work from a single source of truth.

    from mixpeek import Mixpeek
    client = Mixpeek(api_key="YOUR_API_KEY")
    # 1. Create namespace for the facility
    namespace = client.namespaces.create(name="facility-clinical-docs")
    # 2. Build collection with clinical extractors
    collection = client.collections.create(
    namespace_id=namespace.id,
    name="patient-charts",
    extractors=[
    "pdf-extraction", # OCR for scanned charts
    "text-embedding-v2", # Semantic embeddings
    "image-captioning", # Wound photos, imaging
    ],
    )
    # 3. Upload clinical documents
    client.buckets.upload(
    collection_id=collection.id,
    url="s3://facility-ehr-export/patient-charts/"
    )
    # 4. Create MDS-aligned retriever
    retriever = client.retrievers.create(
    namespace_id=namespace.id,
    name="mds-documentation",
    stages=[
    {"type": "hybrid_search", "vector_weight": 0.5, "bm25_weight": 0.5, "top_k": 50},
    {"type": "attribute_filter", "conditions": [
    {"field": "mds_section", "operator": "in", "value": ["G", "J", "K"]}
    ]},
    {"type": "rerank", "model": "colbert-v2", "top_k": 10}
    ]
    )
    # 5. Retrieve MDS-relevant documentation
    results = client.retrievers.execute(
    retriever_id=retriever.id,
    query="functional mobility and ADL performance for Section G"
    )
    for r in results:
    print(f"[{r.metadata.get('mds_section')}] {r.content[:120]}")

    Feature Extractors

    PDF Text Extraction

    Extract structured text and layout information from PDFs

    645K runs

    Image Captioning

    Generate descriptive captions for images automatically

    589K runs

    Retriever Stages

    attribute filter

    Filter documents by metadata attribute values using boolean logic

    filter

    rerank

    Rerank documents using cross-encoder models for accurate relevance

    sort

    Resources Used

    Taxonomy

    Use Cases Using This Recipe

    Advanced
    8 min

    SNF Documentation Intelligence

    Automate MDS assessments and clinical documentation for skilled nursing facilities

    40% less time on charting

    Documentation time reduction

    Who It's For

    SNF operators, MDS coordinators, directors of nursing, and post-acute care organizations managing clinical documentation across skilled nursing facilities