Philip Bankier

Philip’s expertise in RAG, data engineering, and AI agents makes him a natural fit to write about scaling multimodal data warehousing systems.

Recent Posts

Multimodal Monday 35: Small Models, Modular Vision

Week of Nov 24-30, 2025: Alibaba's 6B Z-Image impresses, Tencent's 1B HunyuanOCR beats larger models and APIs, VisionRAG uses 6-9x less memory than ColPali, and RynnVLA-002 boosts real-world robot success by 50%.

December 1, 20256 min read

Multimodal Monday #34: Visuals Coherence, Semantic Vision

Week of Nov 17-23, 2025: Nano Banana Pro creates coherent visualizations, SAM 3 segments by concept not pixels, HunyuanVideo 1.5 leads open-source video, and Step-Audio-R1 matches Gemini 3 Pro on audio reasoning.

November 24, 20258 min read

Multimodal Monday 33: Physical AI, Human Vision

Week of November 10 - November 16, 2025: Pelican-VL gives humanoid robots spatial intelligence, DeepMind teaches AI to see like humans, Marble creates 3D worlds from single images, and Meta opens speech recognition to 1,600+ languages.

November 17, 20255 min read

Multimodal Monday 32: Multi-Query Retrieval, Streaming Video

Multimodal Monday 32: AMER shows 4-21% gains on complex queries by generating multiple embeddings, Adobe MotionStream hits 29 fps with interactive motion controls, Step-Audio-EditX edits voice emotion and style through text prompts, and GEN-0 trains robots for general skills.

November 10, 20256 min read

Multimodal Monday #31: Visual Thinking, Longer Video

Google Latent Sketchpad lets models sketch thoughts before acting, Amazon Nova MME unifies search, Emu3.5 matches Google's Nano Banana locally, BEAR reveals why AI fails physical tasks.

November 3, 20256 min read

Multimodal Monday #30: Smarter Agents, Real-Time 3D

Multimodal Monday #30: WALT and UltraCUA make websites API-smart, Seed3D 1.0 builds 3D assets from one image, DeepSeek-OCR compresses docs 10x with 97% accuracy via optical mapping and AGILE lifts VLM accuracy from 9.5% to 82.8% with interactive puzzles.

October 27, 20257 min read

Multimodal Monday #29: Sampling Smarts, Composable Control

Multimodal Monday #29: Claude Haiku 4.5 runs twice as fast at one-third cost, Trace Anything maps videos to 3D trajectories for motion search, and VIST3A stitches text-to-3D without retraining.

October 20, 20256 min read

Multimodal Monday #28: Diffusion Thinks, Retrieval Unifies

Multimodal Monday #28: Fast-dLLM v2 diffuses text 2.5x faster, Omni-Embed-Nemotron hunts across modalities, and Think-Then-Embed reasons to top MMEB-V2.

October 13, 20257 min read

Multimodal Monday #27: Small Models Beat Giants

Multimodal Monday #27: ModernVBERT's 250M beats 10x larger, DocPruner slashes storage 60%, and Claude Sonnet 4.5 codes 30+ hours. Scale reimagined!

October 6, 20257 min read

Multimodal Monday #26: Adaptive Retrieval, Visual Reasoning

Multimodal Monday #26: MetaEmbed scales retrieval on-the-fly, EmbeddingGemma beats giants with 308M params, and Veo3 develops reasoning.

September 29, 20257 min read

Multimodal Monday #25: Mind Reading Meets Model Efficiency

AI reads intentions in video, Moondream delivers frontier performance at 2B params, Alibaba open-source matches OpenAI. Understanding "why" changes everything!

September 22, 20257 min read

Multimodal Monday #24: Post-Training Prevails, Neural Rendering Rises

RecA boosts quality 17% with 27 GPU-hours, RenderFormer replaces graphics pipelines with transformers, and Lucy-14B delivers instant video. Alignment beats retraining!

September 15, 20257 min read

Multimodal Monday #23: Efficiency Evolves, Agentic Advance

Multimodal Monday #23: REFRAG speeds RAG by 30x, WebWatcher crushes GPT-4o by 27%, and embeddings hit theoretical limits. Efficiency wins big!

September 8, 20258 min read

Multimodal Monday #22: Spatial Crisis, Trust Bottleneck

Multimodal Monday #22: MLLMs fail basic rotations, Intern-S1 beats GPT on science, and MultiTrust-X exposes vulnerabilities. Trust rebuilds AI!

August 25, 20257 min read

Multimodal Monday #21: Multimodal Reality, Expert Breakthrough

Multimodal Monday #21: Text crushes visuals in recommendations, GPT-5 beats doctors by 24-29%, and Spotify's AI evaluates podcasts. AI surpasses human limits!

August 18, 20256 min read