Lineage Traversal

Every Mixpeek document carries the full lineage chain that produced it, from the original bucket object through every transformation. This guide shows the four patterns for navigating that chain efficiently from the API.

Background on the lineage data model: see Documents → Lineage. The TL;DR is that each document has _internal.lineage with root_object_id, root_bucket_id, source_document_id, and a chain array recording every processing step.

When to use what

Pattern	Use case	Round-trips
`?expand=parent`	”Show me this scene and its source frame on one page”	1
`?expand=root_object`	”Show me this document with the original video metadata”	1
`?expand=ancestors`	”Show me the full pipeline that produced this document”	1
`?expand=children`	”Show me all the segments derived from this scene”	1
`GET /documents/{id}/ancestors`	Same as `expand=ancestors` but returns only the chain	1
`GET /documents/{id}/descendants`	Same as `expand=children` but returns only the children	1
`from_object` filter	”Search across everything derived from this video”	1 (no GET first)

The shared rule: never use a list response to grab IDs and then issue per-document GETs. The expand parameter takes a comma-separated list, so a single request fetches the document plus everything you need from its lineage tree.

$expand keywords

Lineage-aware $expand keywords resolve relative to a document’s own _internal.lineage block. They land under _expanded.<keyword> in the response, matching the existing user-field expand shape.

parent
root_object
ancestors
children

The single document referenced by _internal.lineage.source_document_id.

cURL

curl "$API/v1/collections/$COL/documents/$DOC?expand=parent" \
  -H "Authorization: Bearer $API_KEY" \
  -H "X-Namespace: $NS"

Python

client.documents.get(
    collection_identifier="col_scenes",
    document_id="doc_scene_42",
    expand="parent",
)
# Response includes:
# response._expanded.parent — the upstream frame document

For a tier-0 document (created directly from a bucket object), parent is absent — there’s no upstream document. Use root_object instead.

The bucket object that started the lineage tree, fetched from the bucket objects collection (not Qdrant).

curl "$API/v1/collections/$COL/documents/$DOC?expand=root_object"

Useful for “show me this document with its source video filename and upload metadata” without a separate GET /v1/buckets/.../objects/{id}.

Every prior step in the chain, in order from the root through the immediate parent. The document itself is excluded from the list.

curl "$API/v1/collections/$COL/documents/$DOC?expand=ancestors"

For a document at tier 3, ancestors returns 3 elements: the tier-0, tier-1, and tier-2 documents along the lineage path. Steps that have no document_id (e.g., the bucket-object source step at tier 0) are skipped — those don’t correspond to fetchable documents.

Direct downstream documents (depth=1) — every document whose _internal.lineage.source_document_id equals this document’s ID.Capped at 100 children per request. If you need deeper traversal, combine with the from_object filter (see below).

curl "$API/v1/collections/$COL/documents/$DOC?expand=children"

You can request multiple keywords in one call by comma-separating them:

curl "$API/v1/collections/$COL/documents/$DOC?expand=parent,root_object,children"

The same expand is accepted by POST /documents/list (in the request body) and by retriever response shaping — the document GET endpoint is just the simplest demonstration.

Convenience endpoints

For SDKs and UIs that want only the lineage walk without fetching the document itself, use the dedicated endpoints:

# Returns the chain root → parent (excludes the document itself)
GET /v1/collections/{collection_identifier}/documents/{document_id}/ancestors

# Returns direct depth=1 children (max 100)
GET /v1/collections/{collection_identifier}/documents/{document_id}/descendants

Both endpoints return a List<DocumentResponse> with the same shape as GET /documents/{id} per element.

Filter aliases

When you want to search the lineage tree (find every document derived from one root), use the filter aliases instead of expand. They work in document list endpoints, retriever filter stages, and aggregations.

Alias	Resolves to	Use for
`from_object`	`_internal.lineage.root_object_id`	Everything derived from this bucket object
`from_bucket`	`_internal.lineage.root_bucket_id`	Everything derived from this bucket
`from_document`	`_internal.lineage.source_document_id`	Direct children of one upstream document
`from_collection`	`_internal.lineage.source_collection_id`	Documents whose immediate parent was in this collection

// "Show me all scene documents in col_scenes that came from this video"
{
  "AND": [
    { "field": "from_object", "operator": "eq", "value": "obj_video_123" }
  ]
}

// "Direct children of one specific frame document"
{
  "AND": [
    { "field": "from_document", "operator": "eq", "value": "doc_frame_42" }
  ]
}

These aliases are equivalent to the underscore-prefixed paths (_internal.lineage.*) — they exist purely so you don’t have to learn the internal schema. Mix them freely with normal user fields:

{
  "AND": [
    { "field": "from_object", "operator": "eq", "value": "obj_video_123" },
    { "field": "metadata.scene_score", "operator": "gte", "value": 0.8 }
  ]
}

End-to-end example: decomposition tree

To render a decomposition tree for one bucket object — every document at every tier that descended from it — make one filtered list call per collection in the namespace using from_object. The result is already structured by collection, and each document’s _internal.lineage.chain tells you where to draw the edges.

def decomposition_tree(client, namespace, root_object_id):
    namespaces = {}
    for collection in client.collections.list(namespace=namespace):
        docs = client.documents.list(
            collection.collection_id,
            filters={
                "AND": [
                    {"field": "from_object", "operator": "eq", "value": root_object_id}
                ]
            },
        )
        if docs:
            namespaces[collection.collection_id] = docs
    return namespaces

For a deeper materialized view (the chain edges with parent/child resolved inline), use the dedicated decomposition tree endpoint:

GET /v1/buckets/{bucket_id}/objects/{object_id}/decomposition-tree

That endpoint pre-joins everything in one call and is what the Studio namespace detail page uses to draw lineage diagrams.

Limits & caveats

Maximum 50 unique user-field references per expand request — doesn’t apply to lineage keywords (those are bounded by chain length for ancestors and by the children cap for children).
expand=children is capped at 100 children per parent. For deeper traversal or wider fan-out, fall back to a from_document filter.
Recursive expansion is not supported — expand=parent resolves one level. To walk further, use expand=ancestors (full chain) or call /ancestors then re-expand from there.
Lineage is immutable provenance. If an ancestor is deleted, its document_id reference in the chain remains. The ancestors expand silently skips unresolved references — never returns null slots — but client code should still be ready for shorter-than-expected chains.

Documentation Index

​When to use what

​$expand keywords

​Convenience endpoints

​Filter aliases

​End-to-end example: decomposition tree

​Limits & caveats

When to use what

$expand keywords

Convenience endpoints

Filter aliases

End-to-end example: decomposition tree

Limits & caveats