SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations

(surgehq.ai)

19 points | by landonxi 5 hours ago

1 comments

egillie 4 hours ago
Is this because GPT-5 hallucinates less in general?