Audio & Podcast Search Pipeline
Make audio content searchable by transcribing and embedding spoken content. Find specific moments in podcasts, calls, and recordings.
from mixpeek import Mixpeekclient = Mixpeek(api_key="YOUR_API_KEY")namespace = client.namespaces.create(name="audio-search")collection = client.collections.create(namespace_id=namespace.id,name="podcasts",extractors=["audio-transcription", "text-embedding-v2"],chunk_strategy="speaker-turn")# Upload audio filesclient.buckets.upload(collection_id=collection.id,url="s3://your-bucket/podcasts/")# Search across all episodesresults = client.retrievers.execute(retriever_id=retriever.id,query="discussion about AI regulation in Europe")
Feature Extractors
Audio Transcription
Transcribe audio content to text
