News
Newest
Ask
Show
Jobs
Open on GitHub
SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations
(surgehq.ai)
19 points | by
landonxi
5 hours ago
1 comments
egillie
4 hours ago
Is this because GPT-5 hallucinates less in general?
1 comments