As of November 2025, GPT-5.1-Codex-Max is OpenAI’s new frontier coding model, built specifically for long-running, project-scale work in Codex. Its standout capability is a new compaction feature that lets the model operate coherently over millions of tokens in a single task, unlocking multi-hour (even 24+ hour) refactors and deep debugging sessions without “context amnesia.” This guide is an evergreen how-to for developers who want to use GPT-5.1-Codex-Max’s compaction effectively for large-scale refactoring inside real codebases.
What GPT-5.1-Codex-Max and compaction actually do
GPT-5.1-Codex-Max (released November 19, 2025) is a variant of GPT-5.1 optimized for “agentic” coding tasks inside OpenAI’s Codex environment (CLI, IDE extensions, cloud workspaces, code review). Compared to GPT-5.1-Codex, it:
- Handles long-horizon coding with compaction so it can “coherently work over millions of tokens in a single task.”
- Uses ~30% fewer thinking tokens for similar or better performance on coding tasks (per OpenAI’s SWE-bench Verified numbers).
- Supports multi-hour, even 24+ hour continuous loops that persistently iterate, run tests, and refine implementations.
Compaction is the core innovation for large refactors. When a Codex session approaches its context window limit, GPT-5.1-Codex-Max automatically:
- Summarizes and prunes older parts of the conversation and tool outputs.
- Retains key design decisions, goals, and unresolved issues.
- Re-initializes a fresh context window seeded with that compacted summary plus recent steps.
This loop repeats until the task completes, giving you a single continuous agent run that spans many internal context windows without the usual “start a new chat, lose half the story” problem.

When to use GPT-5.1-Codex-Max for large-scale refactors
OpenAI recommends GPT-5.1-Codex-Max specifically for agentic coding in Codex or Codex-like harnesses, not as a general-purpose chat model. It’s the right choice when:
- You’re doing a project-scale refactor: e.g., migrating a framework, changing a core abstraction, or applying a consistent pattern across hundreds of files.
- You need multi-hour continuity: long debugging sessions, deep test-fix cycles, or large feature implementations.
- You want the agent to work semi-autonomously: running shell commands, editing files, running tests, and iterating in a loop.
For quick edits or one-off functions, GPT-5.1 or GPT-5.1-Codex (non-Max) with lower reasoning effort may be cheaper and more responsive. Max shines when context management is the bottleneck rather than raw latency.
| Model | Best for | Key strengths |
|---|---|---|
| GPT-5.1 | General coding and agentic tasks | Fast, adaptive reasoning, wide tooling support |
| GPT-5.1-Codex | Standard agentic coding in Codex | Great coding benchmarks, good for medium tasks |
| GPT-5.1-Codex-Max | Large-scale refactors, multi-hour sessions | Compaction, multi-window context, 24+ hour loops |
How compaction works in practice inside Codex
You don’t call “compaction” directly in most workflows; Codex orchestrates it for you. The key is to work in a way that compaction can summarize accurately.
1. Use Codex surfaces that support long-running agents
- Codex CLI: Run tasks against a local or remote repo; GPT-5.1-Codex-Max can edit files, run tests, and iterate.
- IDE extension (VS Code, Cursor, Windsurf, etc.): Invoke larger refactors, but be aware that some IDEs still favor shorter loops; for truly massive work, the CLI or Codex Cloud tends to behave more predictably.
- Codex Cloud workspaces: For repository-scale automation where the agent can run many tool calls in sequence.
In all these, the Codex runtime tracks session length and triggers compaction when the token budget nears its limit.
2. Structure your instructions for compaction-aware workflows
Compaction works by distilling what matters. Help it by making your intent and constraints easy to compress:
- Start sessions with a clear mission brief: goals, non-goals, constraints (e.g., “no public API breakage”), and acceptance criteria.
- Use stable, reusable markers in your messages: sections like
PROJECT GOALS,INVARIANTS,OPEN ISSUES,DONEmake it easier for the model to preserve them in summaries. - Periodically ask the agent to update a top-level TODO / plan so compaction has an explicit task spine to carry forward.
// Example: initial Codex CLI instruction for a large refactor
Refactor this repository to migrate from Redux Toolkit to Zustand.
PROJECT GOALS:
- Preserve all behavior and public APIs.
- Reduce boilerplate by leveraging Zustand's minimal store patterns.
- Keep test suite green.
INVARIANTS:
- Do not change any exported function signatures from /src/public-api.
- Maintain TypeScript strict mode with no new tsconfig relaxations.
CONSTRAINTS:
- Prefer incremental, feature-by-feature refactors.
- After each batch of changes, run the Jest test suite and fix regressions.
DELIVERABLES:
- Completed migration.
- A MIGRATION_NOTES.md summarizing key decisions and follow-up tasks.These anchors give compaction something to intentionally preserve as history is compressed.
3. Let Codex compact automatically, but watch for drift
According to OpenAI’s product notes, GPT-5.1-Codex-Max will:
- Detect when the session approaches the context limit.
- Summarize conversation + tool calls, pruning low-signal content.
- Continue in a fresh window with that compacted “memory” plus the most recent operations.
As the human in the loop, your job is to periodically re-ground the agent:
- Every 30–60 minutes, ask: “Summarize what we’ve done, remaining risks, and next 5 concrete steps”.
- Ensure critical constraints and goals still appear in its own summaries. If they don’t, restate them.
- If you notice drift, explicitly correct: “We must not change the API of X. Confirm this is still enforced and adjust your plan.”

Concrete workflow: project-scale refactor with compaction
Let’s walk through a realistic large-scale refactor using GPT-5.1-Codex-Max in Codex CLI.
Step 1: Prepare your repo for an AI-driven refactor
- Clean your main branch: merge or rebase outstanding PRs; avoid large concurrent changes.
- Ensure tests pass: the agent relies heavily on test feedback.
- Create a working branch, e.g.
feature/zustand-migration. - Add a PROJECT.md or MIGRATION_NOTES.md with:
- Architecture overview.
- Key invariants and contracts.
- High-level refactor goals.
Step 2: Kick off a focused Codex-Max session
# Pseudo-command; actual syntax depends on Codex CLI version
codex run \
--model gpt-5.1-codex-max \
--repo . \
--task "Migrate from Redux Toolkit to Zustand as described in PROJECT.md.
Work incrementally, keep tests passing, and document decisions in MIGRATION_NOTES.md."In the Codex interaction, reinforce compaction-friendly structure:
Before you begin, restate:
- Your understanding of the project structure
- The migration plan (phased steps)
- The key invariants you must not break
As you work, maintain:
- A running high-level changelog
- An updated TODO list
- Notes on any risky or partial changesStep 3: Let the agent run, but steer it at compaction boundaries
Over time, you’ll see the agent:
- Inspect files and directories.
- Modify code via patch operations or file rewrites.
- Run tests (e.g. Jest, pytest, Maven) and analyze failures.
- Propose and execute follow-up fixes.
Whenever Codex logs or implies a compaction event (e.g., “Compacting session to free up space”), respond with prompts that strengthen its distilled memory:
We just compacted the session.
1. Summarize the overall migration progress in 10 bullet points.
2. List any areas you consider partially migrated or risky.
3. Restate the key invariants you're enforcing.
4. Propose the next 3 focused batches of work.This ensures the new context window is seeded with a high-quality, human-validated summary, not just the model’s automatic compression.
Step 4: Enforce safety and review before merge
OpenAI’s own guidance emphasizes that Codex, even with GPT-5.1-Codex-Max, should be treated as an additional reviewer, not a replacement for code review. For large refactors:
- Run your full test suite and static analysis tools (lint, type-checkers, SAST) yourself.
- Use GPT-5.1-Codex-Max again for code review on the final diff:
- “Review this PR for regressions, API changes, and performance risks.”
- Require human sign-off on high-impact packages or APIs.
Codex runs in a sandboxed environment by default (file writes limited to workspace, network disabled). Unless you have a very controlled need, leave external network access off to reduce prompt-injection risk.
Compaction-aware prompting patterns that work well
Because compaction is essentially high-fidelity summarization of your session, certain prompt patterns make it more reliable during long refactors.
Use stable “memory surfaces” inside the repo
Borrowing from Anthropic’s context engineering best practices, you can give GPT-5.1-Codex-Max explicit places to write memory that survive compaction, such as:
MIGRATION_NOTES.md– high-level decisions, tradeoffs, and open questions.TASKS_TODO.md– remaining steps, prioritized.- Module-level
README.mdfiles – for new architecture or patterns.
As you work, keep MIGRATION_NOTES.md updated with:
- Architectural decisions
- Rationale for non-trivial changes
- Any known follow-up tasks
Treat this file as your long-term memory:
it must remain accurate even after many compaction cycles.Because these files live in the repo, Codex can reload them via tools after compaction, even if earlier chat messages have been compressed away.
Prefer explicit, small batches over global rewrites
Compaction is easier and safer when the narrative is:
- “We completed batch A, then B, then C, with tests between each,” rather than “We changed everything at once.”
So ask GPT-5.1-Codex-Max to:
- Work feature by feature or module by module.
- Describe each batch before it starts:
- “Next, I will migrate the auth module: files X, Y, Z.”
- Run tests after each batch and record results in the notes file.
Regularly request compressed, structured status
Help the model practice good compaction on your behalf throughout the session with prompts like:
Every 30 minutes (or after a major batch of changes), do this:
1. Update MIGRATION_NOTES.md with:
- Completed steps
- Key decisions
- Any known regressions
2. Reply here with:
- A concise bullet summary of progress
- Current risks
- Next 3 stepsThese micro-summaries become the skeleton that compaction uses to maintain coherence across context windows.
Integrating GPT-5.1-Codex-Max into custom tooling
API access for GPT-5.1-Codex-Max is “coming soon,” but you can design your orchestration in advance, especially if you already use GPT-5.1 or GPT-5.1-Codex via the Responses API.
Design your own compaction layer (optional)
Even with built-in compaction in Codex, many teams will add a secondary, explicit compaction layer in their orchestration service:
- Track conversation + tool calls in your own store (e.g. database, vector store, or plain logs).
- Before each API call, slice down to:
- Latest N steps.
- One or more explicit summaries of history.
- Key notes loaded from repo files.
- Every M turns, call the model to refresh the summary that you send in future prompts.
// Pseudo-code sketch of an external compaction-aware loop
while (!task_done) {
const summary = getOrRefreshSummary(history); // compacted high-level state
const recent = getRecentMessages(history, 8); // last few high-signal steps
const messages = [
systemPrompt,
summary,
...recent,
currentUserInstruction
];
const response = openai.responses.create({
model: "gpt-5.1-codex-max",
messages,
tools: [apply_patch, shell, ...],
reasoning_effort: "medium" // or "xhigh" for critical steps
});
// apply tool calls, tests, etc.
history.push(response);
}This pattern combines Codex’s internal compaction with your own guardrails, giving you more predictable behavior across very long tasks.
Combine with GPT-5.1 tools for precise diffs
In non-Codex environments, GPT-5.1 offers an apply_patch tool and a shell tool in the API. For large refactors, design your system so that:
- GPT-5.1-Codex-Max plans and reasons about batch refactors and tests.
- The
apply_patchtool applies small, reviewable diffs. - The
shelltool runs tests and diagnostics, feeding logs back to the model.
Your orchestration engine then decides when to checkpoint, summarize, or roll back based on CI results.

Limitations, gotchas, and best practices
Even with compaction, GPT-5.1-Codex-Max isn’t magic. To get reliable large-scale refactors, keep these constraints in mind:
- Context rot still exists: compaction reduces but doesn’t eliminate the risk that subtle early details get lost. Re-ground periodically and keep critical invariants in persistent files.
- Heavily noisy logs (e.g., giant stack traces) consume attention without adding much value. Pre-filter or summarize logs before handing them to the model when possible.
- Reasoning_effort vs. cost: “xhigh” unlocks best performance on tough steps but is slower and more expensive. Use it only for critical migrations (e.g., core infrastructure modules) and keep “medium” as your default.
- Security: long-running agents with shell access can be a powerful attack surface. Keep Codex sandboxed, log tool calls, and monitor for suspicious behavior.
Most importantly, treat GPT-5.1-Codex-Max as a very capable collaborator, not an infallible automaton. Its compaction feature gives you continuity over huge tasks; your engineering practices give it direction, safety, and quality control.
Conclusion: making compaction your ally in multi-hour refactors
GPT-5.1-Codex-Max’s compaction feature solves one of the most painful limits of earlier AI coding models: losing context mid-project. By automatically summarizing and pruning history, it can operate coherently over millions of tokens and 24+ hour runs, making project-scale refactors, deep debugging, and complex feature work realistic for a single long-lived agent.
To get real value out of it, you should:
- Choose GPT-5.1-Codex-Max for large, long-running coding tasks where context continuity matters.
- Structure your instructions, notes, and repo files so they survive and guide compaction.
- Run multi-hour sessions in Codex with periodic summaries and explicit plans that the model can carry forward.
- Layer Codex’s built-in compaction with your own orchestration (summaries, checkpoints, CI) for maximum reliability.
If you adopt these practices, GPT-5.1-Codex-Max can evolve from “smart autocomplete” into a persistent engineering partner that stays on track from the first commit in your refactor branch to the final green build.