
Tech • IA • Crypto
Google’s Gemma models can run fully on smartphones, enabling offline multimodal AI tasks like voice commands, image understanding, and app automation directly on-device.
Google’s Gemma models are capable of running locally on consumer smartphones such as the Pixel 10 Pro, eliminating the need for constant cloud connectivity. This allows AI features to function in low-signal environments or entirely offline, marking a shift toward privacy-preserving, edge-based computing.
The Google AI Edge Gallery app showcases how these models operate on mobile devices. It integrates multimodal capabilities, including voice, text, and image processing, within a single interface designed to demonstrate real-world use cases for local AI execution.
A key feature is agent skills, where the model interprets user intent and selects the appropriate application to complete a task. For example, a spoken request describing mood and daily feelings is automatically routed to a mood tracking app, which logs the entry without manual navigation.
The system can process spoken input naturally, converting it into structured actions or notes. Tasks such as creating to-do lists—like picking up children, grocery shopping, or buying flowers—are transcribed and organized directly on-device, demonstrating practical productivity use.
Gemma supports visual understanding by analyzing photos taken on the device. It can extract structured information, such as identifying book titles and formatting them into a JSON schema, highlighting its ability to combine vision with programmable outputs.
Beyond recognition, the model can generate contextual suggestions from images. For instance, after capturing a photo of a plant arrangement, it can առաջարկ improvements, such as additional decorative elements, each accompanied by descriptive reasoning.
The system can identify objects in photos without internet access. In one example, it correctly classified a photographed item as a small toy, demonstrating reliable local inference without relying on remote servers.
The integration of audio, image, and text processing into a single on-device model highlights a broader trend toward compact, efficient AI systems. These models can perform transcription, translation, reasoning, and generation tasks within the constraints of mobile hardware.
On-device deployment of Gemma models signals a significant evolution in mobile AI, combining privacy, reliability, and multimodal capabilities without dependence on cloud infrastructure.