A data processing paradigm where data is processed incrementally as it is generated rather than in batch. Streaming data enables real-time multimodal AI applications that respond to new content immediately upon arrival.
Streaming systems process data records continuously as they arrive from producers. Records flow through processing stages (filtering, transformation, enrichment, aggregation) with minimal latency. Unlike batch processing that operates on complete datasets, streaming processes each record or micro-batch independently, enabling real-time or near-real-time results.
Platforms include Apache Kafka (distributed log), Apache Flink (stream processing), Apache Pulsar, and AWS Kinesis. Kafka provides durable, ordered message streams while Flink provides stateful computation over streams. Processing guarantees range from at-most-once to exactly-once semantics. Windowing operations (tumbling, sliding, session) group records for time-based aggregations. Throughput can reach millions of events per second.