ENFR

Tech • IA • Crypto

Briefing Vidéos du jour Briefings vidéo Topics Top 50 du jour Résumés quotidiens

Introducing ChatGPT Images 2.0

IAOpenAI21 avril 202621:48

0:00 / 0:00

Résumé

INTRO

OpenAI lance IMAGen 2.0, une révolution dans la génération d’images intelligentes capables de produire des visuels complexes, précis et multi-langues, avec une qualité inédite de 2K et des fonctions avancées d’analyse et de recherche web.

Points clés

Un saut technologique majeur dans la génération d’images

IMAGen 2.0 est présenté comme une étape comparable à la Renaissance par rapport aux débuts primitifs de la génération d’images. Ce modèle ne se contente plus de créer des images, il réfléchit, recherche sur le web et produit des visuels intégrant un texte structuré avec très peu d’erreurs.
Capacité à générer plusieurs images cohérentes simultanément

Pour la première fois, la génération multiple est possible au sein d’un même projet, permettant de créer des magazines entiers, des plans de rénovation détaillés ou des mangas avec personnages récurrents et narration continue. Les images atteignent une résolution jusqu’à 2K avec un rendu micro-détaillé.
Deux modes d’utilisation : instantané et thinking (réflexion)

Le mode instantané est accessible à tous et produit rapidement des images de haute qualité avec une bonne compréhension visuelle. Le mode thinking, réservé aux utilisateurs payants, prend le temps d’élaborer une requête, interroger le web, et générer un travail visuel complexe et cohérent, idéal pour les projets exigeants.
Une qualité de texte dans l’image quasi parfaite

Le modèle excelle désormais à insérer du texte sans fautes, même dans des paragraphes complets ou des mises en page complexes, un défi majeur pour les générations précédentes. L’orthographe, la cohérence grammaticale et la disposition typographique sont remarquables.
Une intelligence visuelle poussée au-delà de la simple création d’images

Exemple marquant : le modèle peut aider concrètement à choisir des tenues adaptées à une personne à partir d’un portrait, en générant des suggestions pertinentes et des vues détaillées sous plusieurs angles, reproduisant ainsi une expérience proche de l’essayage virtuel.
Support multi-langues avancé et rendu de textes complexes

La génération fonctionne parfaitement avec des langues à alphabets nombreux, notamment les langues asiatiques comme le chinois, japonais, coréen, ainsi que plusieurs langues indiennes. Des posters typographiques multilingues, incluant tous les caractères spécifiques, sont générés avec un réalisme impressionnant.
Photorealisme et flexibilité des formats d’image

Le modèle offre une qualité photographique atteignant un rendu « professionnel », capable d’émuler des styles de prises de vue variés (iPhone, caméra jetable) avec des détails comme le grain ou les imperfections naturelles. Il supporte également des formats panoramiques, très larges ou très hauts, jusqu’à des proportions 3:1.
Applications concrètes et créatives illimitées

Cette technologie permet de créer des magazines, mangas, illustrations pédagogiques, infographies complexes, images à usages publicitaires, logos personnalisés, et même des images incluant des QR codes fonctionnels intégrés dans la composition.
Accessibilité immédiate

IMAGen 2.0 est disponible dès aujourd’hui dans ChatGPT et via l’API OpenAI, avec une version accessible à tous et une autre en mode réflexion pour les abonnés payants. L’interface intègre aussi des préréglages de styles pour faciliter la création.
Une équipe enthousiaste derrière cette avancée

Les chercheurs OpenAI décrivent la progression comme un bond similaire à celui entre GPT-3 et GPT-5 pour le texte. Beaucoup d’efforts ont porté sur la compréhension visuelle fine et la qualité esthétique, amenant une expérience utilisateur immersive, interactive et innovante.

IMAGen 2.0 redéfinit la génération d’images en intelligence artificielle : ce n’est plus un simple outil de création visuelle mais un compagnon capable de « penser » l’image, de naviguer dans des problématiques complexes, et d’aider autant à inventer qu’à communiquer. Cette nouvelle ère ouvre un champ inédit pour la créativité et l’usage quotidien des images automatiques de haute qualité.

Transcription complète

Today we are launching IMAGen 2.0. If we think of Dalia as cave drawings and IMAG gen 1 as ancient art, then I imag 2.0 is the Renaissance. Image Gen 2.0 is the smartest image generation model ever built with the ability to generate complex, polished, and productionready visuals with accurate text and structured design. You see, this model isn't just generating images, it's thinking. >> That's right. Image Gen 2.0 is thinking and researching. And it can even search the web to generate images with the most accurate information available. And with that information, the model is able to generate infographics that explain complex systems and images that solve math problems with proofs. And with new multilingual capabilities, you can create visuals with multiple languages for the entire world. >> And now, for the first time in image generation, you can create multiple distinct images at once. So you can generate entire magazines with structured typography and photorealistic photos, full renovation plans for every room in your house, or manga comics with recurring characters and evolving story lines. And you can now generate images with 2K resolution across multiple aspect ratios with extraordinary micro details. You see, we are no longer generating images to marvel at. With Images 2.0, we are generating images to discover and navigate, to invent and build, to dream and explore the world and bring ideas to life. A little over a year ago, we launched images and chatbt. People loved it and it was amazing to see the creativity it unleashed. But today, we're going to blow way past that with images 2.0. Images 2.0 is a huge step forward. This is like going from GPT3 to GPT5 all at once. the ability to create incredible new images, express creativity, and really make beautiful and complex things is is quite remarkable. I This is an easier thing to just show you than talk about. So, I'd like to jump right in. Uh the team really cooked on this one, and we can't wait to see what you'll do with it. Uh it's available right now in Chatbt and in the API. And here's Gabe to tell you more about it. >> So, hey everyone. I'm Gabe. I'm the research team for CHBT images. >> Hi, I'm Kan. >> I'm Kenji. >> I'm Alex. and we are researchers on the image generation team. >> So I am very excited about this model and um I I think this model is producing images of certain quality. I think it's very hard to explain but one way I would say it is sort of like uh they look just normal. They just look at normal images and uh one experience I had um looking at these images is that after you look at these enough you go back and look at previous images and you see all the mistakes that the the previous model did that you didn't even notice before. I mean, they look great at the time, but I think these images look so much better. Um, anyway, I'm going to kick off a prompt. Um, here's a picture of the four of us, uh, we took yesterday, and, uh, we're going to try and create a magazine cover from this from this from this image. So, um, one thing about this model is this model has a lot of breath and a lot of depth to it. So, I think it'll be a while, uh, you know, before everyone discovers all the little nooks and crannies of this model. Uh, but one thing we noticed about this model is that this model is really good at design. So, you know, it seems to be really really deliberate about where it puts the the text in the image and um Oh, I think >> is a live stream. This is fine. >> Yeah, I think >> maybe I need to retry this. Okay. >> Yes. >> No, I think it's fine. >> Oh. Oh, yeah. Great. I think it's okay. Everything is good. Yes. Um, okay. So, let's take a look. Um, so it it it's really deliberate about where it puts the text and and the design is looks really nice. I I remember a time where image generation could barely generate a single word um you know without making typos. And now know typos are very rare. In fact, it's very hard to even find a single typo. So, >> that's been one of the things that surprised me about the new model is things that I just never thought were possible about cohesion and not making a typo and complex text and a ton of detail in one image. >> It's rare to find a mistake. >> Yes, it's very rare. You can you can you can do a whole paragraph or you full page of text without making a mistake and or the full layout of the magazine. >> Yeah, full layout of the magazine. >> All the small text seems to be very well done and I think the design is really nice. >> You guys look like a very cool boy band. >> Yeah. Okay. So we'll be releasing two uh versions of the model. There was the instant version of the model which is what you're seeing here and there is a thinking version of the model. So the thinking version of the model is um something that um you can toggle using the thinking mode and it will be available to paid users. And what it does is it it deliberates a little bit before it actually generates an image. So it winds up a really good prompt and it can search the web. It can do a lot. So I'm going to um try out this prompt. So last year um we did a version of this prompt where we turned a selfie much more powerful. We can actually generate an entire uh manga from a single prompt. So we can generate like three pages of a manga from a single prompt. And I'm just going to kick off this generation and then uh Kenji will talk a bit more about it. >> So this is you selected thinking mode. We're only doing this for paid users for now. And it can do a much more complex image. >> That is correct. Yeah. So, you have to select the thinking mode to get to this mode. And you can generate multiple images at once and a lot of other very interesting things KJ will talk about. So, I'm going to kick off another prompt here. Um, and I'm not going to spoil it, but it has something to do with the word duct tape. So, yes. Um, okay. I'll hand it over to Kiwan to talk about instant mode. Okay. Thanks, Gabe. Um, instant mode is a version available to everyone starting from today, uh, which we think is much better visual intelligence compared to our previous models. Uh I especially want to highlight that this is the first image model that is actually useful to our daily lives. Uh as an example, I'm going back to the laptop right now and um I'm asking this model for some help to buy new clothes for my upcoming summer vacation after this launch. Uh so in this prompt I'm giving it a portrait image of me and asking it to suggest me like eight different nice summer outfits. uh and in this task the model needs two different kinds of visual intelligence. One is visual understanding where it actually looks at my image and understand how it look like and come up with some plans for nice outfits for me. And another axis is visual generation where it actually turns that uh planned layout into coherent and organized image. uh and we think we made a lot of progress in both of this visual understanding and visual generation. Both of these aspects uh as a result being able to handle this kind of tasks uh very well. Uh and um we now have an output for this where you can see eight different uh really cool outfits for me. >> Kan, what do you like the best? >> Uh I like the first look because I prefer something minimal. I think it's somehow pretty similar to what I'm wearing right now. >> Maybe inverted colors. >> Uh, >> so I'm gonna follow up. >> That's fine. >> Yeah, I like Luke for the first Luke. >> I like the first Luke. >> Mhm. Uh, can you zoom into it and make a same style fashion fashion should of me one hero few alternative views views and detailed clause get so here's a prompt I'm going to follow to the model with this prompt Um, so, so I'm just basically asking you to zoom into it and show me how I I would look like when I'm actually with this outfit. So, while waiting for it, I'm going to revisit the first image a little bit more. Um, so we're back at the laptop. Um, and one really cool thing about uh this image, I think, is that uh all of these coding pieces are labeled with corresponding text. Like it shows like all of this sneakers and fitted tea, things like that. And all of these are looking really like B. So uh this basically shows that our model is uh much more capable and interl visual figures together with a lot of text uh which comes from much improved visual intelligence essentially. Uh yes. So now we have the detailed view of myself where you can see me in this outfit and also from like many different uh angles. It's really like an experience of just going to a store and actually trying this thing. >> So through this demo, I just want to highlight that uh this new model is no more like an AI image generator that you just gives a prompt and it returns an image. It's more like an um AI that you just interactively talk to and it's just going to respond you using images that are very much understandable like this. Uh now I'm passing it to Kenji who will be talking about a deeper intelligence in our model called thinking mode. Thanks Kwan. Uh a major capability that we've introduced in this model is the ability for image generation to think before it produces its final output. This is particularly useful for very complex prompts uh for things that require like web searches for require you to output multiple images that have to uh maintain coherence with each other or even for it to check its work before saying hey here's your final output. But let's just like look over some examples of this first. Uh Gabe actually kicked off a few of these examples at the start of the live stream. So let's go to the one on the phone, which is the one of him and Sam. Um the selfie of them, and they created a manga of it. And if we look at the very first uh image, we can see Yeah, it does look like Gabe and Sam, right? >> Yeah. Mhm. >> But I think what's even cooler about it is that if you look at the follow-up images, they still look like Dave and Sam and they still look like in the style that was originally maintained in the first uh in the first page. >> And even better is that the story should be very consistent among pages one, two, and three. >> Now, thanks. uh to see another one of these in action, let's look at the other example that Gabe kicked off. So to give a little backstory about this, uh a few weeks ago, we beta tested the instant version of this model on Elmarina under the code name duct tape. A few like a few of you on the internet were like really good detectives and deduced that it was us. Um but we're going to announce that it was us. And so in this prompt, we basically asked that uh uh basically GBT images too to go and find social media reactions to this duct tape model and uh basically quote quote people. Um and so we see quotes from threads, LinkedIn, Reddit, etc. But I think an even crazier part is that we've also asked the model to put a QR code to chatb.com so that you can try out this model right now um for yourselves. And can we just make sure that it works? >> Yeah, I tried. >> Oh, nice, nice, nice. So, uh, image generation with thinking allows you to do really complex things such as, so in this case, web search, synthesize answers, and put a QR code all in one image. But we have still more and Alex will talk to you about these new details. Uh, so we've also made a lot of improvements in naturalness. And let me just kick off a few uh prompts. So like uh like Gabe said earlier, our outputs can now just look like natural images. And uh you can actually trigger this by adding something like photorealistic or also there are other variations like professional photography and shot on iPhone or disposable camera. Um so in like in this first example I'm just pretending we are back in 2015 was when open was founded and but somehow there's images too. So and then uh basically as you can see uh the model is actually able to replicate the tiny imperfections graininess and the lighting of the lecture hall. um even all the text on the slide and the lecture plan that the model came up with are quite coherent and uh beyond these like photo realism I'm also very excited the model is much more flexible now and in particular we can make really wide and really tall images uh up to three one by three and 3x one so let's take a look at this this is like one of our team's favorite style pumps and it really demonstrates these uh ability to make really tall images and it makes my neck super long and and like I mean this is pretty cool but uh I mean maybe it's a bit hard to use as a profile picture or share so you can also use this option to make it one by one I'm just I won't show this for interest of time so uh for another found an example that combines both the aspect ratio and naturalness. I also have this uh I asked the model to make a 360 image of the moon landing and I I think it looks like a 360 photo panorama but >> we can also take a look in this panorama viewer that I've coded earlier. So, >> wow. >> As you can see, it's it's like actually a very consistent uh 360 image. It's >> you can also see this this the sun and the shadows are are also in the right direction. >> Oh, that's super cool. That's incredible. You said you vibe coded this part of it. >> Yeah, I just I just made it with codeex very quickly. Nice. >> Um then there's some scene, but you have to look for it like This is incredible. The the images are obviously beautiful, but the intelligence uh behind these images and what a difference that is to any other image generation service out there has been has been incredible. Uh huge congratulations on the progress here. Uh okay, next up we're going to have Nitant and Buyan join us for uh a little bit more. >> And while we're doing that, Gabe, I'm curious what styles you have been enjoying the most or sort of most surprised by. Yeah, I I think there's a few keywords that I really like, but I think like Alex said, I think the word photo realism actually triggers something really very interesting in the model. >> Definitely give that one a try. >> Yes. >> Okay, welcome. >> Hello. Thank you Sam and thanks Gabe. Um, hi, I'm Buan. I'm another member of the image and research team. >> And I'm Natant. I'm an engineer in the chat images team. I'm about to introduce the improved text text rendering capability of our new model. OpenAI is a San Francisco based company. We speak English and use English at work. However, we want everyone in the world to enjoy the same excitement we have when generating images. So, in imagent 2, we made a lot of improvements to make sure that our model can generate per every text perfectly across all the languages, all the cultures in the world. Let's take a look. So in my first example, I want to generate a poster that's a typography art about different language in the world. It's going to feature many many languages. And let's see how does it appears. Um, and while it's generating, I'm going to kick off another demo. Let's say I want to open a open eye bakery. It's a fictional bakery. And I want to open it in Japan. And I want to make a poster about it in purely in Japanese. What languages have you noticed that the new model's gotten the best at? >> Um, I think mostly the Asian languages, let's say Hindi, Chinese, Korean, and um, Japanese. That's because those languages traditionally have thousands of characters in the alphabet unlike the 26 in English. So um previously our model had a hard time memorizing these characters but now just prompted and generate entire pages of text in these languages without arrows. >> Wow. Let's see how does it go. Oh, here is our first example. The typography art. I deliberately prompt it to be in the form of photography of a uh of a real magazine. So it not only look realistic but I can also see the correct characters. Here's Niha in Chinese. There is hello as a bonjou in French. And I hope everyone in the world can actually enjoy our model. Creating your own art using your own language. Let's take a look at the second example. My opening at bakery. Oh, look at it. It even made our logo into into this piece of bread, right? Um this is a Japanese poster. You can see all the ki, all the hilagana. Uh you can even zoom in and see the details. Look at this. Look at all the hilagas here. So I really hope everyone in the world can use this model to make your own poster, open your own shop, everything. And just to show everyone how far we can go with our image generation model. So this is an image I generated with our experimental 4K API. Uh this is just a pile of eyes, but this is also not just one pile of rice. What if I tell you there's one single grain in it with the text GPT image on it? Can you find it? Here we go. >> Yeah, it's at the center. >> I can't see. I see something. >> I made it easy for you guys. >> That's awesome. >> Look at this. Let's zoom in. GPT image 2 on one single grain of rice among the entire >> pile this big. >> This is how far we can go with our latest model. >> Amazing. >> Next, I will let take over. >> Yeah. So, uh, images 2.0 is available to all users to try out right now. And if you're accessing chatgpt from your app, um, make sure to update it to the latest version. And you should see a welcome screen that looks like this, which means you're good to go. Uh, I'm going to start off with a simple everyday kind of prompt. I'm asking it to u create a recipe in Hindi. Uh as Buan said the the new model is significantly better at understanding and rendering text in lots of languages including many Indian ones that I've tried like Hindi, Telugu, Canada, Tamil, um Marathi and so on and the difference is especially obvious if uh there's a lot of densely packed text. So let's see what it comes back with. I'm also curious to see what Indian dish it decides to go with. Oh, there we go. Went with Aloo Parata. That's a That's a classic. >> Nice. >> Oh, >> and uh the text looks really good, too. I don't spot any any errors um >> at first glance. And next, let's also check out some of the uh the new preset styles that we've added into the app. I'm just going to select create images here. And you'll see a bunch of fun ones and some that really take advantage of the new models capabilities. Um, actually, how about why don't we make um logos for the OpenAI bakery? Buen, >> sure. Why don't you just take a photo of my bakery poster and see what fixing? >> Let's do it. So, looks like this is going to come back with 16 to 20 um logo ideas. Uh but this is actually a rather uh simple prompt uh given the model's capabilities. It's really for good at following very detailed instructions. So uh if you have very specific brand language, design, aesthetics, um all of those things that really matter for creative work, um you can use static to iterate and refine on your ideas to get exactly what you want out of it. And we have colorful logo ideas right here. Wow. Here we go. >> Nice. >> Which one do you guys like the most? >> These are good. Uh, how about this one? >> Oh, yeah. This one combines lo our logo and the bread. >> I like it. All of this is also also making me hungry. >> This was uh this is really amazing. I can't wait to see what people will do with this. The the beauty of the images uh will come through right away. The intelligence is very deep and we hope you all have fun exploring this. As we mentioned, this is live today in Chat GBT and the API. Uh, so proud of the team on what they've created here and we hope you will have as much fun using it as we did getting to build it. Thank you very much.

Sur le même sujet : IA