Philip’s expertise in RAG, data engineering, and AI agents makes him a natural fit to write about scaling multimodal data warehousing systems.
A weekly pulse on everything multimodal—models, data, tools & community.
A weekly pulse on everything multimodal—models, data, tools & community. 🎯 Quick Take (TL;DR) * OpenAI GPT-Image 1 arrives in API: The model powering ChatGPT's viral image generation is now available to developers, enabling high-quality, professional-grade image creation directly in third-party apps. [Details ] * Baidu's Ernie 4.5 Turbo goes multimodal: The new LLM interprets pictures and videos while creating documents at 40% the cost of DeepSeek V3 and 25% of DeepSeek R1, accelerating mu
Visual CoT, video gen, and color benchmarks highlight this week's multimodal AI leaps—plus tools, papers, and real-world use cases.
Apple’s new scaling law research redefines how multimodal models are built, while Moonshot and OpenGVLab drop powerful open-source VLMs with reasoning and tool-use.