ENFR

Tech • IA • Crypto

Today Shorts Top Stories Topics All videos YT channels Crypto Archives Favorites

Gemma Playground: AI Edge Gallery

7/10

GoogleGoogle for DevelopersJune 18, 2026 at 07:20 PM3:18

Audio player

0:00 / 0:00

TL;DR

Google’s Gemma models can run fully on smartphones, enabling offline multimodal AI tasks like voice commands, image understanding, and app automation directly on-device.

KEY POINTS

On-device AI with Gemma

Google’s Gemma models are capable of running locally on consumer smartphones such as the Pixel 10 Pro, eliminating the need for constant cloud connectivity. This allows AI features to function in low-signal environments or entirely offline, marking a shift toward privacy-preserving, edge-based computing.

AI Edge Gallery app demonstration

The Google AI Edge Gallery app showcases how these models operate on mobile devices. It integrates multimodal capabilities, including voice, text, and image processing, within a single interface designed to demonstrate real-world use cases for local AI execution.

Agent skills and app interaction

A key feature is agent skills, where the model interprets user intent and selects the appropriate application to complete a task. For example, a spoken request describing mood and daily feelings is automatically routed to a mood tracking app, which logs the entry without manual navigation.

Voice input and contextual understanding

The system can process spoken input naturally, converting it into structured actions or notes. Tasks such as creating to-do lists—like picking up children, grocery shopping, or buying flowers—are transcribed and organized directly on-device, demonstrating practical productivity use.

Image recognition and structured output

Gemma supports visual understanding by analyzing photos taken on the device. It can extract structured information, such as identifying book titles and formatting them into a JSON schema, highlighting its ability to combine vision with programmable outputs.

Creative visual assistance

Beyond recognition, the model can generate contextual suggestions from images. For instance, after capturing a photo of a plant arrangement, it can առաջարկ improvements, such as additional decorative elements, each accompanied by descriptive reasoning.

Offline object identification

The system can identify objects in photos without internet access. In one example, it correctly classified a photographed item as a small toy, demonstrating reliable local inference without relying on remote servers.

Multimodal processing on-device

The integration of audio, image, and text processing into a single on-device model highlights a broader trend toward compact, efficient AI systems. These models can perform transcription, translation, reasoning, and generation tasks within the constraints of mobile hardware.

CONCLUSION

On-device deployment of Gemma models signals a significant evolution in mobile AI, combining privacy, reliability, and multimodal capabilities without dependence on cloud infrastructure.

Full transcript

More from Google