Story temporarily unavailable
We are having trouble reaching this story. Please try again shortly.
LLM evaluation platform Arena launches Agent Mode to benchmark GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro on multi-step tasks · Digg