ENFR

Tech • IA • Crypto

Today Topics Videos Crypto Archives Favorites

X-IA #34 - Show Me Your Agent

8/10

AIEcole polytechniqueMay 18, 2026 at 09:53 AM1:32:26

Audio player

0:00 / 0:00

TL;DR

OpenAI unveils Codex, a code-generation agent capable of autonomously creating entire applications, illustrating a new era in AI-assisted software development.

KEY POINTS

Codex, OpenAI’s autonomous coding agent

Codex can build a complete game without manual intervention, relying on “skills” to automate tasks like testing and validation via Playwright Interactive. This autonomy marks phase 3 in the evolution of coding agents, going far beyond simple assistance to fully delegated missions.

Codex architecture and operation

The agent is built on a large language model (LM) orchestrated by a “harness” divided into three parts: the agent loop for reasoning, execution of actions on tools (terminal execution, file editing, web search), and smooth conversation management. All clients (terminal, IDE, application) communicate through this same central architecture.

Major innovation in terminal session management

The shift from one-shot interaction to a persistent interactive session improves continuity in handling terminal commands, avoiding restarting the session for each model request, which optimizes workflow and agent efficiency.

Managing context limits and costs

As context grows (conversation history), two critical mechanisms were developed: autocompaction (intelligent summarization of history to avoid “context anxiety,” where the model loses relevance) and prompt caching (to avoid recomputing identical sequences). These required several adjustments to work reliably, especially accounting for configuration changes during a conversation.

Feedback from internal Codex usage

Two OpenAI projects demonstrate Codex’s effectiveness:

Porting the Sora application to Android in 28 days,
Creating an internal product fully generated by Codex in 3.5 months, reportedly yielding a 10× time gain compared to traditional development. These experiences highlight that starting with simple tasks and iterating is key, as is well-structuring the code repository to enable intelligent exploration by the model.

Importance of review and continuous feedback

Initial human review quickly proved insufficient. An automated improvement loop, called the “Ralph loop,” allows the model to iterate until a satisfactory result is reached, complemented by its ability to self-evaluate. This extends the duration and complexity of tasks achievable without constant intervention.

Integration of observable feedback to improve results

Codex now receives inputs from development tools like Chrome DevTools, along with logs and metrics to “sense” the quality of its output. This significantly increases its autonomy on long projects, reaching up to six hours without human supervision.

Codex as a “senior engineer” on the team

The recommended mindset is to treat the agent as a capable but new team member, excellent at multitasking and writing tests, but still requiring guidance on internal standards and architectural decisions.

Innovation with Piplex’s “executable AI methods”

Alongside Codex, the startup Piplex proposes an open-source standard called MTHDS (method without vowels) to structure complex business processes beyond simple skills. These methods combine code-like rigor with the flexibility of natural language skills, enabling deterministic and reproducible orchestration for all types of tasks, not just development.

Challenges of agents in non-technical fields

Unlike software development, which benefits from mature tools to translate business needs into precise code, other sectors lack robust tools to structure and repeat complex tasks. The MTHDS solution aims to fill this gap by offering a standard where steps are systematically executed and audited, facilitating compliance and verification.

Disruptive potential of autonomous agents on cognitive work

The recent evolution toward agents capable of handling multiple threads in parallel, greatly amplating human productivity, could trigger an explosion similar to the introduction of spreadsheets, exponentially increasing the amount of intellectual work performed by machines in companies and beyond.

Constraints and limitations of agents

Despite their power, agents exhibit behavioral “rigidity”: they excel in some areas but can become inefficient on very similar tasks. Moreover, increasing the number of agents does not always improve performance and can even reverse gains, highlighting the difficulty of finding an optimal coordination balance.

Perspective on the human role with the rise of AI agents

Humans remain essential for steering, defining direction, and especially qualitatively evaluating results. The real challenge will be maintaining this “client in the loop” while ensuring understanding and control of agents without losing efficiency.

CONCLUSION

Software development and business process management are entering a new era thanks to autonomous agents like Codex and standards like MTHDS. This shift will drive major gains in cognitive productivity while raising challenges around human oversight and direction.

Full transcript

More from AI