SWE-bench will hit 90% this year

(fabraix.com)

6 points | by asfsf23423 4 hours ago

1 comments

  • upmind 1 hour ago
    Maybe unpopular opinion but I think at this point SWE-Bench has done its part and we need a new benchmark because Gemini being on/near the same level as Claude is obviously wrong
    • amazingamazing 1 hour ago
      I use both and think they’re comparable. AMA.
    • lern_too_spel 44 minutes ago
      Gemini at the same level as Claude is believable. Gemini CLI is not at the same level as Claude Code.