by The Opulent OS Team
We rebuilt significant parts of Opulent OS around the thesis that this will be the decade of agents, not the year. The result is a memory-first architecture, deterministic orchestration, and streaming that never strands the user. This refactor is not a model change—it’s a system change. Like Devin’s rebuild for Sonnet 4.5, we found behaviors in modern models that broke old assumptions about how agents should be architected.
Because Opulent is an agent that plans, executes, and verifies—not just a code autocomplete—we get a high-fidelity view into model behavior under real work. Small improvements compound across planning, retrieval, and verification loops. In production, these changes have raised planning reliability, reduced context overflows by ~95%, increased stream completion rates to >99%, and cut “silent” runs via heartbeats and finish-reason normalization.
To get here, we tuned memory before tuning models; enforced budgets before prompting ambition; made streams a product; and bound execution inside a deterministic orchestrator with typed tools and verifiable outcomes. Below are observations—and the systems they forced us to build.
The platform is memory-first
Opulent OS adopted a memory-augmented decision process inspired by Memento: solved episodes become cases; the planner retrieves K≈4 precedents; execution writes back successes and failures with artifacts; optional parametric scoring learns utility beyond similarity. The effect: fewer cold starts, faster convergence, less thrash.
Models externalize state when memory exists. Given structured handoffs and case examples, the planner stops re-deriving obvious steps and adapts proven scaffolds. Without this, agents write florid summaries and still miss constraints. With it, they reference plans, diffs, tests, and deploy notes—the artifacts that make the next decision obvious. Memory doesn’t replace thinking; it aims it.
- Keep K small. Utility > coverage. K≈4 beat K=16.
- Reward what closes loops: tests passing, deploys green, smoke checks verified.
- Decay and prune. Memory is a working set, not a museum.
The planner works under strict budgets
Modern models exhibit context behaviors—becoming decisive near perceived limits, summarizing mid-task, or cutting corners. Rather than fight quirks ad hoc, we made budgets explicit and enforced them uniformly.
- Context-aware budgeting caps completions and shrinks prompts deterministically, preserving system messages and structured artifacts while compressing prose.
- Structured handoffs compress 400K-token working sets into ≤32K JSON with primary request, key topics, resources, decisions, current task, and next step.
- Auto-continue bridges length, tool, and finish-reason boundaries with bounded iterations (default 25), preventing infinite loops while preserving momentum.
The result: fewer overflows, fewer premature wrap-ups, more clean finishes.
The stream is engineered, not incidental
- Fast-ack starts: acknowledge immediately; warm work in background.
- Heartbeats every ~2s to prevent perceived freezes.
- Deduplication of repeated deltas and duplicate step/tool events.
- Split content: clean prose to users; raw metadata to developer panels.
- Finish-reason normalization; remove arbitrary tool-call caps.
- Fallback to polling when listeners drop without leaking errors.
These changes created a step‑function in perceived intelligence. The model didn’t change; our stream did.
The orchestrator delegates and verifies
- Orchestrator–Worker: plan first. Materialize a task list or workflow with IDs, statuses, dependencies. Workers execute typed tools; the coordinator auto-completes parents when children finish.
- Verification as culture: a GEPA-style rubric scores plans, grounding, adherence, clarity, proposing minimal fixes. Steps and tools emit spans; every run is a trace, not a story.
- Temporal migration: workflows deterministically orchestrate, activities isolate side effects, signals carry human input without blocking. Replay replaces speculation.
Effect: multi-hour sessions remain legible, resumable, and safe.
What we’re exploring next
- Utility-aware memory at scale: parametric case scoring with uncertainty and time decay; cross-thread knowledge under consent.
- Meta-agent prompting: models reason about workflow health—budgets, dependencies—before acting.
- Context management models: learned shrinkers that preserve code/plan structure; adaptive budgets by task complexity.
- Temporal-native everything: planning, verification, and high-value tools as deterministic workflows.
The memory-first, budgeted, streamed, and orchestrated Opulent is available now. It doesn’t promise magic; it delivers momentum.