Philip’s expertise in RAG, data engineering, and AI agents makes him a natural fit to write about scaling multimodal data warehousing systems.
Multimodal Monday #29: Claude Haiku 4.5 runs twice as fast at one-third cost, Trace Anything maps videos to 3D trajectories for motion search, and VIST3A stitches text-to-3D without retraining.
Multimodal Monday #28: Fast-dLLM v2 diffuses text 2.5x faster, Omni-Embed-Nemotron hunts across modalities, and Think-Then-Embed reasons to top MMEB-V2.
Multimodal Monday #27: ModernVBERT's 250M beats 10x larger, DocPruner slashes storage 60%, and Claude Sonnet 4.5 codes 30+ hours. Scale reimagined!
Multimodal Monday #26: MetaEmbed scales retrieval on-the-fly, EmbeddingGemma beats giants with 308M params, and Veo3 develops reasoning.
AI reads intentions in video, Moondream delivers frontier performance at 2B params, Alibaba open-source matches OpenAI. Understanding "why" changes everything!
RecA boosts quality 17% with 27 GPU-hours, RenderFormer replaces graphics pipelines with transformers, and Lucy-14B delivers instant video. Alignment beats retraining!
Multimodal Monday #23: REFRAG speeds RAG by 30x, WebWatcher crushes GPT-4o by 27%, and embeddings hit theoretical limits. Efficiency wins big!
Multimodal Monday #22: MLLMs fail basic rotations, Intern-S1 beats GPT on science, and MultiTrust-X exposes vulnerabilities. Trust rebuilds AI!
Multimodal Monday #21: Text crushes visuals in recommendations, GPT-5 beats doctors by 24-29%, and Spotify's AI evaluates podcasts. AI surpasses human limits!
Multimodal Monday #20: Study challenges multimodal hype, Genie 3 builds 3D from text, and TURA blends real-time data. The future demands targeted deployment!
Multimodal Monday #19: Wan 2.2 rolls out with a week of daily feature releases, HairCUP refines 3D avatars, and E-FineR boosts recognition. Open Source Chinese AI surges ahead!
Multimodal Monday #18: MoVieS rebuilds 4D in 1s, MindJourney boosts reasoning by 8%, and MOSPA predicts audio motion. Spatial intelligence takes off!
Multimodal Monday #17: MoVieS creates 4D scenes in 1s, MOSPA tracks audio motion, and ColQwen-Omni unifies search. Real-time understanding expands!
Multimodal Monday #16: Mirage creates real-time games at 16 FPS, Ainos-Solomon fuses smell+vision, and LongVILA-R1 handles 3h video. Real-time drives new possibilities.
Multimodal Monday #15: ARAG lifts Walmart recs by 42%, PubMedBERT SPLADE nails medical search, and Microsoft serves 1.8B fans. Specialization leads the way!