Built from Gemini 3 research, Gemma 4 delivers frontier-level intelligence with multimodal support, agentic workflows, and 140+ languages. Run it locally on your hardware or deploy to the cloud.
Gemma 4 is Google DeepMind's fourth-generation family of open-source large language models (LLMs), released on April 2, 2026. Built on the same cutting-edge research and technology as Gemini 3, Gemma 4 is designed to be the most capable open model you can run on your own hardwareβfrom smartphones to workstations.
Unlike previous generations, Gemma 4 introduces breakthrough capabilities that go far beyond simple chatbots:
E2B, E4B, 26B, 31B - optimized for different hardware
Text + Image + Audio input, text output
128K-256K tokens context window
Native function calling and tool use
Mixture-of-Experts (MoE) architecture for speed
Run completely offline on your device
Apache 2.0 license for commercial use
#3 on Arena AI open model leaderboard
| Feature | Gemma 3 | Gemma 4 |
|---|---|---|
| Multimodal | Text + Image | Text + Image + Audio |
| Context Window | 128K | 128K-256K |
| Function Calling | Limited | Native support |
| Thinking Mode | No | Yes (configurable) |
| License | Gemma Terms | Apache 2.0 |
| Languages | 100+ | 140+ |
| Arena AI Rank | #6 | #3 |
Gemma 4 comes in four distinct sizes, each optimized for specific deployment scenarios. Whether you're building on-device mobile apps or running powerful workstation agents, there's a Gemma 4 model for you.
The "E" stands for "effective parameters"βthese models use Per-Layer Embeddings (PLE) to maximize efficiency on mobile and IoT devices.
Designed for consumer GPUs and workstations, these models deliver frontier-level intelligence for local development.
ββ Yes β E2B (basic) or E4B (advanced)
ββ No β Continue
ββ Yes β 26B A4B (MoE)
ββ No β Continue
ββ Yes β 31B (Dense)
Gemma 4 achieves state-of-the-art performance across text, code, reasoning, and multimodal tasks. Here's how it compares to other leading models.
As of April 2026, Gemma 4 ranks #3 among all open-source models on the Arena AI text leaderboard, with the 31B model scoring 1452 ELOβoutperforming models 20x its size.
| Benchmark | Gemma 4 31B | Gemma 4 26B | Gemma 4 E4B | Gemma 4 E2B |
|---|---|---|---|---|
| MMLU Pro (Knowledge) | 85.2% | 82.6% | 69.4% | 60.0% |
| AIME 2026 (Math) | 89.2% | 88.3% | 42.5% | 37.5% |
| LiveCodeBench v6 (Code) | 80.0% | 77.1% | 52.0% | 44.0% |
| Codeforces ELO (Competitive Coding) | 2150 | 1718 | 940 | 633 |
| GPQA Diamond (Science) | 84.3% | 82.3% | 58.6% | 43.4% |
Gemma 4 excels at understanding images, documents, and audio:
Get started with Gemma 4 in minutes using your favorite tools. All models are available for free download under the Apache 2.0 license.
The fastest way to run Gemma 4 locally:
# 1. Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh
# 2. Run Gemma 4 with one command
ollama run gemma4
# Other versions:
ollama run gemma4:e2b # smallest, fastest
ollama run gemma4:e4b # default, balanced
ollama run gemma4:26b # MoE, fast inference
ollama run gemma4:31b # dense, best quality
For a user-friendly GUI experience:
LM Studio features:
Deploy Gemma 4 at scale on Google Cloud:
Enterprise features:
pip install transformers
from transformers import AutoModelForCausalLM
pip install vllm
vllm serve google/gemma-4-31b-it
Gemma 4's native function calling enables true autonomous agents:
Example: Google AI Edge Gallery's "Agent Skills" demonstrates on-device agents that can query Wikipedia, generate visualizations, synthesize music, and build complete apps through conversation.
Run powerful AI completely offline on edge devices:
Supported Platforms:
Gemma 4 achieves 80% on LiveCodeBench v6 and 2150 Codeforces ELO:
Process text, images, and audio in a single model:
Vision Capabilities:
Audio Capabilities (E2B/E4B):
Yes! Gemma 4 is released under the Apache 2.0 license, which allows free commercial use, modification, and distribution. Unlike previous Gemma versions, there are no restrictive terms or usage limitations.
Yes, if you have at least 8GB of RAM. The E2B model (quantized to 4-bit) requires only 3.2GB of memory and can run on most modern laptops. For better performance, the E4B model needs 5-8GB of RAM.
Gemma 4 is an open-source model you can run locally, while ChatGPT is a proprietary cloud service. Gemma 4 offers privacy, offline capability, and no usage costs, but ChatGPT (GPT-4) is generally more capable for complex tasks. Gemma 4 31B performs comparably to GPT-3.5 in many benchmarks.
For mobile/edge devices: E2B or E4B
For laptops with 16GB RAM: E4B (quantized)
For desktops with RTX 3060-4070: 26B A4B (quantized)
For high-end GPUs (RTX 4090, A100): 31B
Yes, likely! Gemma 4 is pre-trained on 140+ languages including English, Spanish, French, German, Chinese, Japanese, Korean, Arabic, Hindi, and many more. It has native multilingual support without requiring translation.
Yes! All Gemma 4 models support text and image input. The E2B and E4B models also support audio input for speech recognition and translation. The larger 26B and 31B models support text and images but not audio.
Download sizes:
β’ E2B: ~4.2GB (5-15 minutes on fast internet)
β’ E4B: ~5.9GB (7-20 minutes)
β’ 26B A4B: ~17GB (20-60 minutes)
β’ 31B: ~19GB (25-70 minutes)
Tools like Ollama and LM Studio handle downloads automatically.
Yes! Gemma 4 supports fine-tuning using popular frameworks: Hugging Face Transformers with QLoRA, Keras with LoRA, Unsloth (fastest), and Google's Gemma library. Fine-tuning requires more VRAM than inferenceβtypically 2-3x the base model size.
Yes, significantly. Gemma 4 improvements over Gemma 3:
β’ +20-30% performance across benchmarks
β’ Native multimodal support (audio on small models)
β’ 2x longer context (256K vs 128K)
β’ Native function calling for agents
β’ Apache 2.0 license (more permissive)
β’ Configurable thinking mode
β’ Better multilingual support (140+ vs 100+ languages)
Absolutely! That's one of Gemma 4's biggest advantages. Once downloaded, you can run it completely offline with no internet connection. This makes it perfect for privacy-sensitive applications, air-gapped environments, remote locations without connectivity, and reducing API costs to zero.
Dense (31B): Uses all 30.7B parameters for every token
β’ Pros: Highest quality, best for fine-tuning
β’ Cons: Slower, more memory
MoE (26B A4B): Uses only 3.8B active parameters per token
β’ Pros: Much faster (almost as fast as 4B model), lower latency
β’ Cons: Still needs 26B memory loaded, slightly lower quality
Choose MoE for speed, Dense for maximum quality.
Yes! Gemma 4 has native function calling support. You can define tools/functions in JSON schema format, and the model will: (1) Decide when to call a function, (2) Generate proper function arguments, (3) Process function results, (4) Continue the conversation. This enables agentic workflows like web search, database queries, API calls, etc.