Concept Extraction
Extract key concepts, definitions, and relationships from educational content across video, slides, and code
Input
Enter the text you want to process
URL pointing to content file. Supported formats: MP4, PDF, ZIP (code), TXT. Default: undefined
Type of content to process. Default: undefined
Content domain for better extraction. Default: general
Extraction mode to use. Default: pattern-based
Extract concept prerequisite and related relationships. Default: true
Extract concept definitions from context. Default: true
Extract code/text examples demonstrating concepts. Default: true
Classify concepts by difficulty level. Default: false
Link concepts across video, slides, and code. Default: true
Generate semantic embeddings for concepts. Default: true
Minimum confidence threshold (0.0-1.0). Default: 0.7
Output
{"document_id": "doc_abc123","collection_id": "col_xyz789","source_object_id": "obj_def456","concepts": [{"id": "concept_001","name": "malloc","category": "memory_management","definition": "A C library function that allocates a specified number of bytes from the heap and returns a pointer to the allocated memory","confidence": 0.94,"sources": [{"type": "transcript","timestamp": 125.3,"text": "we use malloc to allocate memory dynamically...","confidence": 0.91},{"type": "slide","slide_number": 15,"title": "Dynamic Memory Allocation","confidence": 0.88},{"type": "code","file": "memory.c","line": 45,"snippet": "int *ptr = malloc(sizeof(int) * 10);","confidence": 0.96}],"difficulty_level": "intermediate","importance_score": 0.87,"frequency": 15,"examples": ["malloc(sizeof(int) * 10)","ptr = malloc(100)"],"prerequisites": ["pointers","heap_memory"],"related_concepts": ["free","calloc","realloc"],"parent_concepts": ["memory_management"],"child_concepts": [],"embedding": {"model": "bge-m3","dimension": 1024,"vector": [0.023,-0.142],"normalized": true}}],"graph_stats": {"total_concepts": 47,"categories": {"memory_management": 8,"data_structure": 12,"algorithm": 5,"function": 15,"principle": 7},"difficulty_distribution": {"beginner": 15,"intermediate": 22,"advanced": 10},"average_confidence": 0.84,"total_relationships": 156},"content": {"type": "video","filename": "cs50_lecture4_memory.mp4","domain": "programming","duration": 5430.5,"title": "CS50 2024 - Lecture 4 - Memory"}}
Already have embeddings? Skip extraction — search your own vectors with MVS. First 1M vectors free.
Try MVS →