How to Use Gemini 1.5 Pro to Analyze a Full Codebase

Struggling to onboard to a massive new codebase? Developers often spend days or weeks mapping out dependencies, spotting bugs, or documenting legacy code. Gemini 1.5 Pro pioneered massive context windows up to 2 million tokens, enabling analysis of entire repositories in one prompt. As of November 2025, Gemini 1.5 models are retired, but successors like Gemini 2.5 Pro and Gemini 3 Pro maintain 1 million+ token contexts with superior coding performance. This guide shows how to leverage these models for codebase analysis, boosting your workflow efficiency.

Why Gemini’s large context excels at codebase analysis

Traditional code analysis tools like SonarQube or static analyzers require rulesets tuned per language and often miss architectural issues. Gemini models process ~750,000 words or 30,000+ lines of code at once, reasoning across files for security vulnerabilities, inefficiencies, or refactoring opportunities.

Released February 2024, Gemini 1.5 Pro introduced 1M-2M token windows (Google Blog, Feb 15, 2024). Current stable models like gemini-2.5-pro (June 2025) offer 1,048,576 input tokens, excelling in codebases per official docs (ai.google.dev/gemini-api/docs/models, last updated Nov 2025).

ModelRelease/UpdateInput Context (Tokens)Coding Strengths
gemini-1.5-pro (retired)Feb 2024 / Sep 2025 shutdownUp to 2MPioneered long-context code review
gemini-2.5-proJune 2025 (stable)1,048,576Complex reasoning, agentic workflows, codebases
gemini-3-pro-previewNov 20251,048,576Advanced multimodal, coding benchmarks leader
gemini-2.5-flashJune 20251,048,576Fast, cost-efficient for large-scale analysis
Current Gemini models for code analysis (sources: ai.google.dev/gemini-api/docs/models, cloud.google.com/vertex-ai/generative-ai/docs/models, Nov 2025)

Use gemini-2.5-pro for most codebase tasks—it’s production-ready with thinking mode for step-by-step reasoning.

Prepare your codebase

Concatenate repository files into a single text file with path headers. Exclude binaries/images; focus on source/config. This fits the 1M+ token limit (~500k-750k words).

Verify repo size: Use token estimators (e.g., Gemini API’s countTokens). Aim under 800k tokens initially.

# Bash script to generate codebase.txt (Linux/Mac)
find /path/to/repo -type f \( -name "*.js" -o -name "*.ts" -o -name "*.py" -o -name "*.java" -o -name "*.yaml" -o -name "*.json" \) -not -path "*/node_modules/*" -not -path "*/.git/*" | while read file; do
  echo "// FILE: $file"
  cat "$file";
  echo "";
  echo "";
done > codebase.txt

# Python alternative for cross-platform
import os

def concat_repo(root_dir, output_file, extensions=['.py', '.js', '.ts', '.java', '.yaml', '.json'], exclude_dirs=['node_modules', '.git']):
    with open(output_file, 'w', encoding='utf-8') as out:
        for subdir, _, files in os.walk(root_dir):
            if any(ex in subdir for ex in exclude_dirs):
                continue
            for file in files:
                if any(file.endswith(ext) for ext in extensions):
                    path = os.path.join(subdir, file)
                    out.write(f"// FILE: {path}\n")
                    try:
                        with open(path, 'r', encoding='utf-8') as f:
                            out.write(f.read() + '\n\n')
                    except UnicodeDecodeError:
                        pass  # Skip non-UTF8
    print(f"Codebase ready: {output_file}")

concat_repo('/path/to/repo', 'codebase.txt')

Run this; output is your prompt input. For monorepos, split frontend/backend.

Flowchart of codebase preparation: Clone repo → Filter source files → Concatenate with path headers → Token count check → Ready for Gemini prompt
Codebase preparation workflow

Access Gemini via Google AI Studio or API

Free tier: Google AI Studio (aistudio.google.com/app) with gemini-2.5-pro (32k free limit; 1M paid). Pro: Vertex AI or Gemini API (keys at aistudio.google.com/apikey).

  1. Get API key or log into AI Studio.
  2. Select gemini-2.5-pro (or latest).
  3. Set temperature=0, top_p=0.95 for deterministic output.
  4. Safety: Block none for full analysis (risky code review).

Craft prompts for full codebase analysis

System instruction defines role/objective. User prompt: Paste codebase.txt.

System Instruction:
You are an expert code reviewer. Analyze the entire codebase below for:

1. Security vulnerabilities (e.g., SQLi, XSS).
2. Performance issues (e.g., N+1 queries).
3. Code smells/refactoring opportunities.
4. Missing error handling/tests.
5. Architecture inconsistencies.

Output in Markdown table: | File | Line | Severity (High/Med/Low) | Issue | Fix |

Batch 50 issues max. End with | END |. Continue on "more".

User: [paste codebase.txt]

Iterate: Type “more” until “END”. Export chat or summarize.

Step-by-step analysis in AI Studio

  1. Paste system instruction.
  2. Paste codebase.txt (drag file).
  3. Submit; review first batch.
  4. Prompt “more” until END.
  5. New chat: “Summarize all issues into CSV” + paste outputs.
  6. Copy to Sheets.

Automate with Gemini API

import google.generativeai as genai

genai.configure(api_key='YOUR_KEY')
model = genai.GenerativeModel('gemini-2.5-pro',
                              system_instruction='[your system prompt]',
                              generation_config={'temperature': 0})

chat = model.start_chat()
response = chat.send_message(open('codebase.txt').read())

issues = []
while True:
    print(response.text)
    if 'END' in response.text:
        break
    response = chat.send_message('More issues.')

# Parse/save CSV
with open('issues.csv', 'w') as f:
    f.write(response.text)  # Post-process tables

Adapt from Medium guide (May 2024); tested with 2.5 Pro.

Diagram of prompt structure: System role/objective → Codebase input → Output table format → Iteration loop with 'more' prompts
Prompt engineering for iterative analysis

Best practices and limitations

  • Prioritize: High-severity first.
  • Chunk repos: <500k tokens.
  • Context caching (API): Reuse codebase for follow-ups (docs.ai.google.dev/caching).
  • Verify: AI hallucinates; cross-check fixes.
  • Cost: ~$3.50/1M input tokens (2.5 Pro); batch for savings.
  • Limitations: No execute code in Studio; use code_execution tool in API.

"Gemini 2.5 Pro can comprehend vast datasets… including entire code repositories."

Google Vertex AI docs, Nov 2025

Example outputs

For a Node.js repo: Detected async leaks, unvalidated inputs. Fixes: Add middleware, promises.

Next steps

Try in AI Studio today. Integrate with CI/CD via Vertex AI. Migrate from 1.5 Pro scripts—models backward-compatible. Explore grounding/tools for dynamic analysis.

Key takeaways: Prep smart, prompt precisely, iterate. Unlock codebase insights—faster onboarding, cleaner code.

Written by promasoud