
Tech • IA • Crypto
Un jeune développeur a reconstructuré une architecture secrète d’intelligence artificielle, offrant une alternative novatrice aux modèles classiques, tandis que plusieurs entreprises accélèrent sur des modèles plus efficaces et modulaires.
Open Mythos : une hypothèse testable d’architecture IA récurrente
À seulement 22 ans, Kai Gomez a créé Open Mythos, une déclinaison open-source en PyTorch d’une architecture mystérieuse surnommée « Clawed Mythos », réputée pour son potentiel et son opacité. Contrairement aux modèles classiques empilant des couches de neurones distinctes, Open Mythos utilise un « recurrent depth transformer » (RDT) : un petit nombre de couches est réutilisé jusqu’à 16 fois, raffinant à chaque itération l’état caché. Cette boucle interne intègre à chaque étape l’entrée initiale, évitant que la représentation ne dérive. Cette réutilisation des poids, combinée à un système de mixture of experts (MoE) de 384 experts dont seuls 8 sont activés par passage, permet une profondeur de raisonnement et une diversité de connaissances sans explosion des paramètres.
RDT, profondeur de réflexion plutôt que taille brute
Contrairement aux architectures classiques qui gagnent en performance en multipliant les couches et les paramètres, cette approche mise sur un raisonnement étendu dans l’espace latent, sans générer d’étapes intermédiaires visibles, ce qui est fondamentalement différent des méthodes actuelles dites de « chain of thought ». Un modèle RDT de 770 millions de paramètres a atteint une performance équivalente à un transformeur standard de 1,3 milliard de paramètres, remettant en cause les dogmes du scaling en IA. Ce raisonnement latent permet aussi de mieux généraliser à des combinaisons de données jamais vues et d’allonger dynamiquement la profondeur de raisonnement à l’inférence, au-delà des limites rencontrées en entraînement.
Solutions aux défis récurrents : stabilité et adaptivité
Les architectures récurrentes risquent des instabilités avec des états cachés qui explosent lors de trop nombreuses itérations. Open Mythos implémente une injection dite « linéaire temps-invariante », garantissant la stabilité. Pour éviter le surprocessus, chaque token a un signal appris d’« adaptive computation time » pour arrêter les boucles quand la réflexion est jugée suffisante. Des « depthwise LoRA adapters » permettent à chaque itération de moduler légèrement les transformations, augmentant la flexibilité malgré le partage des poids. De plus, une attention compressée à faible rang réduit la mémoire jusqu’à 10 à 20 fois, renforçant l’efficacité.
Moonshot AI et Kimmy K 2.6 : une autre vision à grande échelle
Dans la même veine, Moonshot AI a présenté Kimmy K 2.6, un modèle géant de 1 trillion de paramètres utilisant aussi mixture of experts avec 384 experts, activant un sous-ensemble restreint par entrée. Ce modèle intègre une capacité multimodale via un encodeur vision de 400 millions de paramètres. Le système fait appel à jusqu’à 300 agents parallèles pour décomposer et exécuter les tâches complexes, et un dispositif « claw groups » permet d’associer humains et agents IA. Sur le benchmark HLE Full qui comprend 2 500 questions de haut niveau, Kimmy K 2.6 dépasse légèrement GPT-5.4 et Claude Opus 4.6, avec un score de 54 contre 52,1 et 53 respectivement.
XAI : innovations dans la reconnaissance vocale et la synthèse
XAI a lancé des API Speech-to-Text et Text-to-Speech sous son écosystème Gro, déjà intégrées dans Tesla, Starlink et applications mobiles. Le service ST couvre 25 langues, avec reconnaissance en temps réel, diérisation des locuteurs et prise en charge de nombreux formats. Le TTS offre cinq voix expressives sur 20 langues. La tarification est compétitive, avec notamment 0,10 $/heure pour la transcription batch. Sur des benchmarks d’entités en appels téléphoniques, Gro ST affiche un taux d’erreur de 5 %, nettement inférieur à 11 Labs (12 %) ou Deepgram (13,5 %). Malgré des résultats auto-déclarés, l’intégration à grande échelle dans des contextes réels confère à XAI un avantage distinctif.
Vers un changement de paradigme dans le scaling IA
Ces recherches mettent en lumière un virage possible : au lieu d’augmenter la taille brute des modèles, l’avenir pourrait reposer sur des architectures capables d’étendre la durée et la qualité de leur réflexion à l’inférence via des systèmes récurrents et modulaires. Cette approche promet des gains d’efficacité et de flexibilité, avec des modèles qui « pensent » plus longuement plutôt que plus massivement. Parallèlement, les solutions d’orchestration distribuée et multimodale comme Kimmy K 2.6 ouvrent la voie à des systèmes plus adaptatifs et collaboratifs.
En somme, l’IA explore aujourd’hui des architectures qui combinent profondeur de raisonnement, spécialisation experte et modularité, offrant une alternative à la simple croissance du nombre de paramètres. Cette évolution radicale pourrait bousculer les standards et accélérer les capacités cognitives des modèles.
All right. So, something pretty interesting is happening right now in the AI space. And it's not coming from a big lab release. Not from OpenAI, not from Anthropic, but from a 22-year-old who basically looked at one of the most secretive architectures in the industry and tried to rebuild it from scratch. And the wild part is the idea actually makes sense. There's been a lot of talk around clawed mythos, this supposed architecture that people have been hinting at as something different, something potentially more powerful, maybe even too dangerous, depending on who you listen to. No official paper, no full breakdown from anthropic, just fragments, speculation, and a lot of curiosity. So what this guy Kai Gomez did is take all the public research, all the hints, all the patterns we're seeing across newer models, and he built something called open Mythos, fully open- source, implemented in PyTorch and not as a copy of Mythos, but as a hypothesis, a testable one. And at the center of it is this idea called a recurrent depth transformer or RDT. Now, this is where things start shifting. Most models you're familiar with, GPT style models, llama, mistral, all of them follow the same basic structure. You stack layers. Each layer has its own weights. You want more capability, you add more layers, more parameters, and suddenly you're at billions or even trillions of parameters. RDT flips that. Instead of stacking more layers, you take a smaller set of layers and you run them multiple times. In open mythos, that loop runs up to 16 times. Same weights reused again and again, refining the internal state each time. So instead of a deeper model, you get deeper thinking during inference. The way it's structured is actually pretty clean. There's a prelude at the start which encodes the input once. Then you have the recurrent block which is the core looping multiple times and finally a kod at the end which produces the output. Inside that loop, something interesting happens. Each iteration updates the hidden state using a mix of three components. The previous state, the original input signal, and the transformer computation itself. That reinjection of the input every loop is important because otherwise the model would drift too far away from what it was supposed to process. And mathematically, it's expressed like this internal update rule where the hidden state evolves step by step controlled by learned matrices that decide how much pass state and input to keep. Now, here's where it gets even more efficient. Instead of a standard feed forward layer, Open My MethOS uses a mixture of experts set up around 384 experts in total, each one specialized for different kinds of tasks. But at any given time only a small subset is active. In the case of Kimmy K 2.6 for example only eight experts are selected per input. So you get this combination of breath and depth. Thee gives you access to a wide range of specialized knowledge while the looping gives you deeper reasoning. And crucially each loop can activate different experts so it's not just repeating the same computation over and over. That answers one of the biggest criticisms people have when they first hear this idea. Running the same thing 16 times sounds inefficient. It sounds like wasted compute. But if each pass routes through different experts, then each pass is actually adding new information. So instead of stacking hundreds of layers with different weights, you reuse a smaller set of weights in a smarter way. And the results are kind of crazy. And honestly, that same idea of getting way more output from a smarter system instead of just brute forcing everything is exactly why tools like Higsfield start getting interesting, too. Higsfield is sponsoring today's video, and their marketing studio takes that same kind of compression and applies it to ad creation. Normally, making ads is still messy. You need a product angle, a script, footage, edits, maybe a creator, maybe a voice over, and then you still need multiple versions to test. It turns into this whole chain of steps that eats time fast. Marketing Studio collapses that down. You can paste in a product link or upload an image and it turns that into multiple finished ad formats inside one workflow. So instead of getting one generic output, you can generate UGC style videos, tutorials, unboxings, product reviews, faster cut promo ads, even more polished TV style creatives, all built around the same product. And it runs on Cedence 2, which is what handles the motion, visual consistency, and overall quality. >> Honestly, I stopped reading AI news. I just watch AI revolution, robots, models, the entire frontier covered daily. Go subscribe. You'll thank me. You can even use your own face or generate an avatar inside the platform. Then keep that same identity across the different videos. So, the whole thing feels less like patching together random tools and more like actually having an ad engine. If you want to try Higsfield Marketing Studio, the link is in the description. All right, now back to the video. There's research showing that a 770 million parameter RDT can match the performance of a 1.3 billion parameter standard transformer trained on the same data. That's almost half the parameters with similar output quality. That alone already challenges one of the core assumptions in AI scaling. But there's more. All of this reasoning happens entirely in latent space. There are no intermediate tokens generated during the process. The model doesn't think stepby step in text like chain of thought prompting. It doesn't write out its reasoning and then read it back. It just thinks internally. 16 iterations all happening inside the hidden state vectors and then at the end you get a single output. This is fundamentally different from how most people understand AI reasoning today. Chain of thought is visible reasoning. RDT is hidden reasoning. And there's a big advantage here. Because it's operating in continuous space, it can represent multiple possible reasoning paths at once. Something closer to a breadthirst search, all happening inside one forward pass. There are also experiments backing this up. One of them looks at systematic generalization. Basically, can the model handle combinations of knowledge it never saw during training? Standard transformers struggle with this? They tend to fail when the exact combination isn't in the data set. The recurrent transformer handled it. Another test looked at depth extrapolation. The model was trained on reasoning chains up to 20 steps, then tested on 30-step problems. Standard transformers collapsed. The recurrent model just added more loops and kept going. So instead of being limited by what it saw during training, it can extend its reasoning dynamically at inference time. That's a big deal because it suggests that the bottleneck in current models isn't knowledge. It's the ability to combine that knowledge effectively and looping seems to unlock that. Now of course this kind of architecture comes with its own problems. One of the biggest is stability. If you keep looping, the hidden state can explode. values grow uncontrollably and the model breaks. This is something that's been a known issue with recurrent architectures for a long time. Open Mthos addresses this using something called linear time invariant injection based on the parquet paper. Basically, it constrains the system so that the hidden state remains stable no matter how many loops you run. There's also the opposite problem. Too many loops can lead to overthinking. The model goes past the correct answer and starts drifting into noise. To solve that, they use adaptive computation time. Each token gets a learned signal that decides when to stop looping. Harder parts of the input get more compute. Easier parts stop early. So now you have dynamic reasoning depth per token. On top of that, there are depthwise Laura adapters. These are small parameter additions that slightly modify behavior at each loop step. So even though the base weights are shared, each iteration isn't identical. And then there's attention. Instead of standard attention, open MYOS uses something similar to multilatent attention from DeepSeek. It compresses key value pairs into a lower rank representation, reducing memory usage by up to 10 to 20 times. So you're getting efficiency across multiple layers of the system. Fewer parameters, less memory, more flexible reasoning. And all of this comes together into a pretty clear idea. Scaling might shift. Instead of training bigger models, the focus might move toward letting models think longer during inference. That's a completely different direction. Now, while all of this is happening on the research side, you also have companies pushing in parallel directions with actual deployed models. Moonshot AI just released Kim K 2.6 and this thing is massive. 1 trillion parameters. But even here you see similar ideas showing up. It uses mixture of experts with 384 experts. Again only activating a small subset per input. It uses multi head latent attention similar to what we just talked about to compress attention data and reduce hardware requirements. The activation function is swig glue which is more efficient than older approaches and already used in models like llama. And then you have multimodal capability through a 400 million parameter vision encoder, allowing it to process both text and images. But what stands out more is how it handles tasks. It can spawn up to 300 agents for complex workflows. These agents break tasks into substeps and execute them in parallel. So instead of one model doing everything sequentially, you get this distributed execution system. There's also something called claw groups, which lets the model bring humans into the loop, splitting tasks between AI agents and real people. And performance-wise, Moonshot claims it outperforms GPT 5.4 and Claude Opus 4.6 on multiple benchmarks. On HLE Full, which is one of the hardest benchmarks out there with around 2,500 doctorate level questions across more than 100 fields, Kimmy K 2.6 scored 54. Opus got 53 GPT. 5.4 got 52.1. So very close but slightly ahead. Now of course these are company reported benchmarks. So you always take them with a bit of caution. Every company tends to highlight their strongest results. But the trend is clear. Efficiency, modularity, and parallelism are becoming more important than just raw parameter count. And then on another front, you've got XAI pushing into voice. They just released new speechtoext and texttospech APIs under the Gro ecosystem. And these are already being used in Tesla vehicles, Starlink support systems, and mobile apps. So this isn't new tech. It's production tested tech now being exposed to developers. The ST side supports 25 languages, real-time and batch transcription, speaker diorization, word level timestamps, and 12 audio formats. The TTS side has five voices, Ara, Eve, Leo, Rex, and SA across 20 languages and can even include expressive tags like laughter or size. Pricing is aggressive. $0.10 per hour for batch transcription, 0.20 $20 for streaming and $4.20 per 1 million characters for text to speech. That's cheaper than most competitors right now. And performance-wise, at least according to XAI, it's strong. On phone call entity recognition, Gro ST has a 5% error rate. 11 Labs is at 12%, Deepgram Graham at 13.5%, Assembly AI at 21.3%. That's a significant gap, especially for industries like healthcare, law, finance, where accuracy really matters. But again, these are self-reported numbers. 11 Labs has years of optimization in voice quality and nuance, which might not show up in benchmark tests, so real world performance still needs to be judged by actual usage. Still, XAI has one big advantage, scale. millions of interactions already running through Tesla and Starlink systems. So even if it's not perfect, it's already proven in real environments. Anyway, that's it for this one. Let me know what you think about this direction and I'll catch you in the next one.