OpenAI has released a significant update to its Agents SDK, adding native sandbox execution and what the company calls a "model-native harness." The changes address one of the fundamental challenges in building useful AI agents: giving them a safe, persistent environment where they can actually execute code and work with files.
The Agents SDK update introduces two core capabilities that were previously difficult to implement reliably. First, developers can now run agent code inside secure sandboxes, meaning the AI can execute shell commands and manipulate files without risking the host system. Second, the new model-native harness provides a standardised way to connect agents to tools while maintaining state across interactions.
Think of it like giving an AI assistant its own isolated computer. Previously, if you wanted an agent to write and run a Python script, you had to build elaborate safety mechanisms yourself. Now that infrastructure comes built in.
Alongside the SDK update, OpenAI detailed how it built an agent runtime using the Responses API. The system combines three components: the Responses API for handling model interactions, a shell tool for executing commands, and hosted containers that provide isolated environments for each agent session.
The containers are the key innovation here. Each agent gets its own sandboxed environment with a file system, the ability to install packages, and persistent state. When an agent needs to analyse a dataset, for instance, it can write a script, execute it, examine the output, and iterate, all within a secure boundary that prevents any impact on other systems.
The gap between AI models that can discuss tasks and AI agents that can complete them has always been execution. A language model might know exactly how to process a spreadsheet, but actually doing it requires running code, handling errors, and managing files. These updates collapse that gap considerably.
For developers, the practical benefit is speed. Building a secure, stateful agent environment from scratch typically requires weeks of infrastructure work. With native sandbox execution, that foundation comes ready to use.
The direction here points toward agents that handle increasingly complex, multi-step workflows. With secure execution environments and persistent state, agents can take on tasks that span hours rather than seconds. Processing large codebases, running iterative experiments, or managing file-heavy workflows all become more tractable.
The challenge now shifts to reliability and cost. Long-running agents consume more compute resources and have more opportunities to fail or drift off course. OpenAI's infrastructure handles the security layer, but developers still need to build robust error handling and human oversight into their agent designs.