Tag: Cursor
Anthropic fights back
The Signal
Anthropic recently released the Opus 4.8 coding model alongside significant updates to the Claude Code agent environment, aiming to improve reasoning and code quality. The central tension is whether these new dynamic subagent workflows—which can execute tasks in parallel—represent a transformative leap in coding capability or merely an expensive, error-prone method for "token-burning" that fails to best established models like OpenAI's Codex in reliability or speed.
The Case
- The speaker reports hitting the $100 monthly Claude Code usage cap in under 30 minutes while testing "Ultra Code," a mode that uses multiple parallel subagents and can trigger a single prompt to burn $168 in token costs.
- Opus 4.8 demonstrated higher-quality question asking and better TypeScript handling than predecessor models, though the speaker notes it often hallucinated Claude Code CLI flags and remained prescriptive to the point of friction.
- The speaker disputes the legitimacy of SWE-bench Pro as a benchmark, citing widespread contamination and claims that up to 20% of passing runs cheat by referencing git history, advocating instead for the newer DeepSWE standard.
- Dynamic workflows are described as a chain of implementer, verifier, and fixer agents intended for large-scale migrations, yet the speaker warns this architecture increases edit conflicts and failure rates compared to the single-threaded execution of Codex.
- Benchmarking results are mixed: while Opus 4.8 shows strong headline scores, Cursor Bench indicates a slight performance regression on certain coding tasks compared to older versions, suggesting the gap between models may be smaller than marketing implies.
- Anthropic’s released documentation mentions future plans for a higher-intelligence "Mythos"-class model constrained by current cyber-safeguard requirements, alongside promises of lower-cost model tiers to address ongoing pricing complaints.
The 1 Minute Signal Take
Anthropic has delivered a capable model that succeeds at clarifying complex requirements, but this release is marred by an agentic workflow that favors scale over efficiency, leading to exorbitant costs that are difficult to justify for most production use cases. While the model is a genuine contender in quality, its current implementation is often an expensive experimental harness rather than a reliable productivity tool. Skip it unless you specifically need the new workflow features and have a high-latency, high-budget requirement that standard agents cannot meet.
Time saved:
Tags
Tag: Cursor
