OpenAI o4-proto-4 achieves 82% autonomous resolution on full SWE-Bench Verified + Multi-File

[AI-NEWS]...
Home News OpenAI o4-proto-4 achieves 82% autonomous resolution on full SWE-Bench Verified + Multi-File

[AI-NEWS]

Date: 2026-02-25

Content: o4-proto-4 sets new internal record by resolving 82% of SWE-Bench Verified multi-file tasks completely autonomously (design → code → test → debug → commit) over multi-hour sessions without any human edits or guidance.

Keywords : autonomous software engineering, SWE-Bench Verified, multi-file resolution, o4-proto-4, end-to-end coding agent


 
20    2026-02-25