
Tech • IA • Crypto
OpenAI’s Codex now operates full computer interfaces, enabling users to automate tasks across apps with speed, multitasking, and app-level permission controls.
Codex, originally built as a coding assistant, has evolved into a general-purpose digital operator capable of interacting with any application on a user’s computer. It can click, type, and navigate graphical interfaces just as a human would, extending its usefulness beyond code into everyday workflows across local software environments.
The system operates directly on graphical user interfaces, allowing it to use virtually any application without needing special integrations. This includes tools like UTM, Spotify, messaging apps, and productivity software. By visually interpreting interfaces and executing actions, Codex removes the need for APIs or manual scripting.
A key breakthrough is Codex’s ability to run multiple tasks simultaneously across different applications. It can, for example, create a virtual machine, play music, and set reminders at the same time. Unlike earlier systems that monopolized the device, Codex works in the background without interrupting the user’s own activity.
Codex uses its own on-screen cursor, distinct from the user’s, allowing both to operate concurrently. This enables seamless multitasking where the user retains control of their device while the AI performs automated actions independently.
The introduction of faster models like Codex Spark significantly boosts execution speed. Tasks such as composing and sending messages can now be completed in near-instant time, with performance described as exceeding human speed in certain cases.
Instead of relying solely on visual screenshots, Codex leverages system accessibility frameworks to extract structured information about interface elements. This allows it to understand context, identify off-screen components, and interact with apps more precisely, improving reliability and efficiency.
Initial setup is designed to be simple, requiring minimal user interaction. Permissions are granted through a guided interface, enabling users to activate computer control features quickly while maintaining awareness of system changes.
Codex operates under a strict permission system where each application must be explicitly approved before access. It cannot view or interact with other apps unless authorized, ensuring sensitive data remains isolated and enhancing user trust.
Capabilities once limited to specialized systems have been incorporated into mainline GPT models, making advanced computer interaction features more widely accessible through standard APIs. This unification simplifies development and expands potential use cases.
Early use cases include automating repetitive tasks such as managing spreadsheets, configuring development environments, and handling multi-step workflows across several apps. The system is positioned as a time-saving tool that reduces manual effort in complex digital tasks.
Development is focused on achieving speeds 2 to 10 times faster than human interaction, with the goal of making AI-driven computer use indispensable for both professional and personal computing tasks.
The feature is currently available on macOS, with plans to expand support to Windows systems in the near future, signaling broader adoption across platforms.
Codex’s ability to operate full computer interfaces marks a shift toward AI systems that actively execute tasks, positioning it as a central tool for automating complex, multi-application workflows.