view article Article How I contributed a new model to the Transformers library using Codex nielsr • Mar 30 • 52
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published Jul 2, 2025 • 36
view article Article Open-source DeepResearch – Freeing our search agents +3 m-ric, albertvillanova, merve, thomwolf, clefourrier • Feb 4, 2025 • 1.32k
view article Article SmolVLM2: Bringing Video Understanding to Every Device +5 orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova • Feb 20, 2025 • 340
view article Article SmolVLM - small yet mighty Vision Language Model +3 andito, merve, mfarre, eliebak, pcuenq • Nov 26, 2024 • 418
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 manu • Jul 5, 2024 • 319
view article Article DeepSearch Using Visual RAG in Agentic Frameworks 🔎 paultltc • Mar 21, 2025 • 38
view article Article Reinforcement Learning for Large Language Models: Beyond the Agent Paradigm royswastik • Mar 19, 2025 • 8