58 28 10

Dhaval Patel

DhavalPatel

dhaval-patel-2b287033

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

submitted a paper 3 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

upvoted a paper 7 days ago

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

View all activity

Organizations

upvoted a paper 3 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Paper • 2606.19704 • Published 4 days ago • 30

upvoted a paper 7 days ago

Evoflux: Inference-Time Evolution of Executable Tool Workflows for Compact Agents

Paper • 2606.12674 • Published 12 days ago • 5

upvoted an article 21 days ago

Article

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

ibm-research

•

21 days ago

• 87

upvoted an article 26 days ago

Article

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

ibm-research

•

26 days ago

• 17

upvoted a paper 26 days ago

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Paper • 2605.24219 • Published 27 days ago • 9

upvoted 6 papers about 1 month ago

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

Paper • 2605.20630 • Published May 20 • 12

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules

Paper • 2605.08614 • Published May 9 • 7

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

Paper • 2605.14051 • Published May 13 • 1

Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge

Paper • 2605.08518 • Published May 8 • 11

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

Paper • 2605.09131 • Published May 9 • 59

When to Trust Imagination: Adaptive Action Execution for World Action Models

Paper • 2605.06222 • Published May 7 • 44

upvoted a paper about 2 months ago

IndustryAssetEQA: A Neurosymbolic Operational Intelligence System for Embodied Question Answering in Industrial Asset Maintenance

Paper • 2604.23446 • Published Apr 25 • 4

upvoted a paper 3 months ago

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

Paper • 2603.22386 • Published Mar 23 • 57

upvoted a paper 4 months ago

SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search

Paper • 2512.23167 • Published Dec 29, 2025 • 1

upvoted 2 articles 4 months ago

Article

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

christian-washington, ajasuja, santosh-iima, lewtun, burtenshaw

•

Feb 12

• 35

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

burtenshaw, SaylorTwift, kramp, merve, davanstrien, nielsr, julien-c

•

Feb 4

• 90

upvoted a collection 5 months ago

Enterprise Agents and Benchmarks

Collection

Enterprise agent ecosystem featuring AssetOpsBench (industrial) and ITBench (SRE, FinOps, CISO), CUGA to accelerate AI Automation • 19 items • Updated 25 days ago • 17

upvoted an article 5 months ago

Article

AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality

ibm-research

•

Jan 21

• 33

upvoted a collection 8 months ago

AI-Agent-4-Industry-4.0

Collection

This category highlights the collective efforts of the AI Automation team in advancing Industry 4.0 applications and exploring innovations beyond it. • 6 items • Updated Oct 8, 2025 • 8

upvoted a collection 9 months ago

Granite Docling

Collection

Models for parsing complex PDFs and structured documents, designed to complement Docling. • 4 items • Updated Apr 29 • 64

Dhaval Patel

AI & ML interests

Recent Activity

Organizations

DhavalPatel's activity

Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

Community Evals: Because we're done trusting black-box leaderboards over the community

AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality