ORAgentBench: Can LLM Agents Solve Challenging Operations Research Tasks End to End?
9/10The paper "ORAgentBench" evaluates large language model (LLM) agents' capability to independently perform complex end-to-end operations research tasks. Published on June 19, 2026, the study highlights current strengths and limitations of LLM agents in fully autonomous workflows within operations research.
