In November 2025, Google DeepMind revolutionized AI music creation with the release of Lyria 3, a multimodal music generation model now embedded directly into the Gemini app. This groundbreaking tool enables users to transform text descriptions, images, and videos into high-fidelity musical compositions featuring vocals, instruments, and dynamic arrangements. Whether you’re a content creator, filmmaker, or musician, Lyria 3 offers unprecedented creative possibilities. This guide provides step-by-step instructions for leveraging all three input modalities to craft custom soundtracks, complete with professional tips and workflow optimizations.
What is Lyria 3 and Why It Matters
Lyria 3 represents the third generation of Google DeepMind’s music-focused AI models, distinguished by its ability to process multiple input types simultaneously. Unlike its predecessors—which primarily relied on text prompts—the new version interprets visual elements from images and videos to generate context-aware musical scores. Official benchmarks show it can produce 4-minute tracks with 220+ instruments in under 90 seconds, maintaining studio-grade audio quality at 48kHz sampling rates.
Key capabilities include:
- Text-to-music generation with genre/style control
- Image analysis for mood-based composition
- Video frame extraction to synchronize music with visual action
- Vocal synthesis supporting 128 languages
This multimodal approach addresses a critical industry challenge: creating music that aligns precisely with visual narratives. A 2025 study by Berklee College of Music found that AI-generated scores using Lyria 3 reduced soundtrack production time by 73% compared to traditional methods.
Generating Music from Text Prompts
Text-based composition remains Lyria 3’s most accessible entry point. Follow these steps for optimal results:
- Open Gemini’s “Create Music” interface
- Describe your desired track using specific parameters:
- Genre (e.g., “epic orchestral”, “lo-fi hip hop”)
- Mood (“melancholic with hopeful undertones”)
- Instrumentation (“piano, strings, and electronic beats”)
- Structure (“verse-chorus-verse-chorus-bridge-chorus”)
- Adjust duration (15s–20m per track)
- Select vocal inclusion and language
- Click “Generate” to create your composition
Pro Tip: Use the “Advanced Parameters” section to control tempo (60–200 BPM), key signature, and dynamic range. For cinematic scores, include scene descriptions like “mountain sunrise with soaring eagles” to trigger appropriate instrumentation.
Creating Soundtracks from Visual Inputs
Lyria 3’s visual analysis engine processes uploaded images and videos through a three-stage pipeline:

To create music from visual content:
- Click the media upload icon in Gemini’s music interface
- Select:
- Image files (PNG/JPEG up to 10MB)
- Video clips (MP4/MOV up to 2GB, 1080p recommended)
- Wait for scene analysis (typically 10–20 seconds)
- Review automatically generated descriptive tags (e.g., “sunset beach”, “fast-paced car chase”)
- Edit tags to emphasize specific elements
- Click “Generate Soundtrack” to create synchronized audio
- Adjust instrument volumes using frequency sliders
- Apply reverb and echo effects
- Trim or loop specific sections
- Export stems for professional mixing
- Explore Gemini’s API for custom integrations with DAWs
- Participate in Google’s AI music ethics forums
- Experiment with multimodal prompts combining all three input types
For videos, Lyria 3 analyzes motion patterns to adjust musical intensity. A skateboarding action sequence might trigger energetic drum patterns, while a slow dance scene would produce softer piano melodies.
Advanced Techniques and Workflow Optimization
Maximize Lyria 3’s potential with these professional strategies:
| Technique | Implementation | Benefit |
|---|---|---|
| Multimodal Layering | Combine text descriptions with background images | Enhances contextual accuracy by 40% |
| Scene Segmentation | Annotate specific video frames for musical changes | Creates precise audio-visual synchronization |
| Style Transfer | Upload reference tracks for sonic inspiration | Matches instrumentation to existing compositions |
Use Gemini’s built-in audio editor to refine generated tracks:
For collaborative projects, share editable project links that allow team members to suggest musical variations while maintaining version control.
Conclusion: The Future of Music Creation
Lyria 3 in Gemini marks a paradigm shift in music production, making professional-grade composition accessible to creators without formal musical training. By integrating text, image, and video inputs, it bridges the gap between visual storytelling and auditory experience. Early adopters in film production and game development report workflow efficiency gains exceeding 60%, according to a November 2025 industry survey.
To stay ahead in this evolving landscape:
As AI music technology progresses, creators should focus on enhancing human creativity rather than replacing it. Lyria 3’s intuitive interface and powerful capabilities make it the perfect partner for transforming inspiration into professional audio reality.




