Generative AI

How to Use Lyria 3 in Gemini: Generate Music from Text, Image & Video

2026-02-19165-lyria3-gemini-music-from-text-image-video

In November 2025, Google DeepMind revolutionized AI music creation with the release of Lyria 3, a multimodal music generation model now embedded directly into the Gemini app. This groundbreaking tool enables users to transform text descriptions, images, and videos into high-fidelity musical compositions featuring vocals, instruments, and dynamic arrangements. Whether you’re a content creator, filmmaker, or musician, Lyria 3 offers unprecedented creative possibilities. This guide provides step-by-step instructions for leveraging all three input modalities to craft custom soundtracks, complete with professional tips and workflow optimizations.

What is Lyria 3 and Why It Matters

Lyria 3 represents the third generation of Google DeepMind’s music-focused AI models, distinguished by its ability to process multiple input types simultaneously. Unlike its predecessors—which primarily relied on text prompts—the new version interprets visual elements from images and videos to generate context-aware musical scores. Official benchmarks show it can produce 4-minute tracks with 220+ instruments in under 90 seconds, maintaining studio-grade audio quality at 48kHz sampling rates.

Key capabilities include:

  • Text-to-music generation with genre/style control
  • Image analysis for mood-based composition
  • Video frame extraction to synchronize music with visual action
  • Vocal synthesis supporting 128 languages

This multimodal approach addresses a critical industry challenge: creating music that aligns precisely with visual narratives. A 2025 study by Berklee College of Music found that AI-generated scores using Lyria 3 reduced soundtrack production time by 73% compared to traditional methods.


Generating Music from Text Prompts

Text-based composition remains Lyria 3’s most accessible entry point. Follow these steps for optimal results:

  1. Open Gemini’s “Create Music” interface
  2. Describe your desired track using specific parameters:
    • Genre (e.g., “epic orchestral”, “lo-fi hip hop”)
    • Mood (“melancholic with hopeful undertones”)
    • Instrumentation (“piano, strings, and electronic beats”)
    • Structure (“verse-chorus-verse-chorus-bridge-chorus”)
  3. Adjust duration (15s–20m per track)
  4. Select vocal inclusion and language
  5. Click “Generate” to create your composition

Pro Tip: Use the “Advanced Parameters” section to control tempo (60–200 BPM), key signature, and dynamic range. For cinematic scores, include scene descriptions like “mountain sunrise with soaring eagles” to trigger appropriate instrumentation.


Creating Soundtracks from Visual Inputs

Lyria 3’s visual analysis engine processes uploaded images and videos through a three-stage pipeline:

Technical workflow diagram showing Lyria 3's multimodal processing pipeline with image analysis, text interpretation, and audio synthesis stages
Lyria 3’s multimodal processing workflow for music generation

To create music from visual content:

  1. Click the media upload icon in Gemini’s music interface
  2. Select:
    • Image files (PNG/JPEG up to 10MB)
    • Video clips (MP4/MOV up to 2GB, 1080p recommended)
  3. Wait for scene analysis (typically 10–20 seconds)
  4. Review automatically generated descriptive tags (e.g., “sunset beach”, “fast-paced car chase”)
  5. Edit tags to emphasize specific elements
  6. Click “Generate Soundtrack” to create synchronized audio
  7. For videos, Lyria 3 analyzes motion patterns to adjust musical intensity. A skateboarding action sequence might trigger energetic drum patterns, while a slow dance scene would produce softer piano melodies.


    Advanced Techniques and Workflow Optimization

    Maximize Lyria 3’s potential with these professional strategies:

    TechniqueImplementationBenefit
    Multimodal LayeringCombine text descriptions with background imagesEnhances contextual accuracy by 40%
    Scene SegmentationAnnotate specific video frames for musical changesCreates precise audio-visual synchronization
    Style TransferUpload reference tracks for sonic inspirationMatches instrumentation to existing compositions

    Use Gemini’s built-in audio editor to refine generated tracks:

    • Adjust instrument volumes using frequency sliders
    • Apply reverb and echo effects
    • Trim or loop specific sections
    • Export stems for professional mixing

    For collaborative projects, share editable project links that allow team members to suggest musical variations while maintaining version control.


    Conclusion: The Future of Music Creation

    Lyria 3 in Gemini marks a paradigm shift in music production, making professional-grade composition accessible to creators without formal musical training. By integrating text, image, and video inputs, it bridges the gap between visual storytelling and auditory experience. Early adopters in film production and game development report workflow efficiency gains exceeding 60%, according to a November 2025 industry survey.

    To stay ahead in this evolving landscape:

    • Explore Gemini’s API for custom integrations with DAWs
    • Participate in Google’s AI music ethics forums
    • Experiment with multimodal prompts combining all three input types

    As AI music technology progresses, creators should focus on enhancing human creativity rather than replacing it. Lyria 3’s intuitive interface and powerful capabilities make it the perfect partner for transforming inspiration into professional audio reality.

Enjoyed this article?

Subscribe to get more AI insights and tutorials delivered to your inbox.