ENFR

Tech • IA • Crypto

Today Topics Videos Crypto Archives Favorites

Making agentic workflows trustworthy and verifiable with a custom DSL

7/10

AnthropicClaudeMay 22, 2026 at 05:13 PM29:35

Audio player

0:00 / 0:00

TL;DR

A custom domain-specific language can make AI workflows more trustworthy by exposing, verifying, and enforcing how results are produced, not just the outputs themselves.

KEY POINTS

Mechanism over output

Identical outputs from two AI systems do not imply equal reliability. Systems using advanced models, tool use, and iterative critique differ fundamentally from simpler pipelines, even if results match. Trust depends on how conclusions are generated, not just the final answer.

Trade-off between speed and rigor

AI system design involves balancing fast responses with thorough, defensible analysis. High-rigor workflows require more computation and time but deliver stronger guarantees of correctness and provenance, especially in high-stakes domains like scientific research.

Three requirements for trustworthy agents

Effective agent workflows must be legible, allowing humans and other systems to inspect each step. Iteration must retain fidelity so that refinements do not drift from the original goal. Finally, execution must faithfully follow the defined process to ensure consistency and reliability.

Introduction of AshPL DSL

The system uses AshPL, a domain-specific language tailored for research workflows. It is a restricted, typed subset of Python, designed to be simple and predictable. The language is purely functional, with no loops or mutation, enabling easier verification and reproducibility.

Domain-specific primitives

AshPL includes built-in operations aligned with scientific research, such as retrieving academic papers, filtering studies, and joining datasets. This specialization allows workflows to directly encode domain logic rather than relying on generic prompting.

Executable and inspectable workflows

Workflows are not مجرد plans but executable programs. Every output artifact, such as a research table, is directly tied to the underlying AshPL code. Users and systems can inspect or audit the exact steps used to generate results.

Iterative write–execute loop

The system continuously generates, executes, and refines AshPL programs. Errors such as type mismatches are quickly detected and corrected. This loop ensures progressive improvement while maintaining structural consistency.

Full re-execution with caching

Each iteration re-executes the entire program rather than partial updates, reducing logical drift. Performance is preserved באמצעות a content-addressed cache that stores prior computations, allowing reuse of previously evaluated steps.

System architecture for reliability

The architecture includes a user interface, event log, Python execution service, and a sandboxed component that generates AshPL code. A secure gateway manages model interactions, preventing exposure of sensitive data such as API keys.

Visualization and transparency

In addition to code inspection, workflows can be visualized as structured graphs. This helps users quickly understand and validate the sequence of operations, making complex analyses more interpretable.

Support for layered analysis

Users can incrementally extend workflows, adding new analyses such as market strategies or regulatory relationships. The system integrates these additions into a growing program without losing prior context or coherence.

Engineering complexity beyond the DSL

Building the language itself is only part of the effort. Supporting systems such as interrupt handling, session persistence, evaluation frameworks, and model orchestration require substantial engineering investment.

Evaluation challenges

Assessing correctness is difficult because the system dynamically generates and executes programs. Dedicated evaluation processes are necessary to ensure accuracy, robustness, and consistency across diverse workflows.

CONCLUSION

Trustworthy AI systems depend as much on transparent, verifiable processes as on accurate outputs, and domain-specific languages offer a practical path to achieving that balance in complex workflows.

Full transcript

More from Anthropic