How to Fit a Codebase in an LLM Context Window
Even with 1M-token windows, real codebases overflow. A 50k-line TypeScript project averages ~700K tokens including comments and dependencies. freefilestoprompt.app handles overflow with a priority + auto-fit model: you tell the tool which files matter most (priorities + pins), the tool packs those first, lower priorities get dropped greedily when the budget runs out.
Practical strategies for large codebases
- Drop the obvious noise first.
node_modules,dist,.next,build, lockfiles, generated code, large fixture data. These are usually 60-80% of repo size with near-zero LLM signal. Use the Drop button on each. - Pin the architecture-defining files. Top-level entry points, route definitions, schema files, type contracts, key abstractions. These are small but high-information.
- High-priority for the area you're asking about. If you're asking about auth, mark all auth-related files high. The packer keeps highs first.
- Low-priority for context-establishing fluff. README, configuration, examples — useful background but droppable when over budget.
- Use line numbers when prompts reference specific lines. Toggle the option; output gets
1prefixes per line. Lets the LLM cite line numbers back at you.
What "doesn't fit" actually means
The budget bar turns red when included files exceed (context window - reserved output). The LLM will reject prompts that exceed its context window, but reserved output also matters: if your prompt fills the entire window, the model has no room to generate a response. Default reserved is 8000 tokens; raise it (16000-32000) if you expect long responses, lower it if you're asking yes/no questions.
Token counts at a glance
| Codebase | Approx. tokens | Models that fit it whole |
|---|---|---|
| Small library / single feature (~5k LOC) | ~70K | All models |
| Medium app (~20k LOC) | ~280K | GPT-5, Claude Opus, Gemini 2.5 Pro, Llama 4 Scout |
| Large product / monorepo subset (~50k LOC) | ~700K | Claude Opus, Sonnet, Gemini 2.5 Pro, Llama 4 Scout |
| Full monorepo (~200k LOC) | ~2.8M | Llama 4 Scout (10M) |
| Linux kernel-scale | ~50M+ | None — must pack a subset |
Rough estimates assuming ~14 LOC per 1k tokens. Actual ratio varies 10-30 LOC per 1k depending on language and density.
When auto-fit is not enough
For very large codebases, even aggressive priority + drop won't fit everything in the window. Two options:
- Multi-prompt workflow: Pack different subsets for different questions. "Auth module + types + tests" for one prompt, "API routes + middleware + types" for another.
- Summarize-then-include: Run an LLM-pass to summarize each file or directory into a short description, pack the summaries (much smaller) along with the actual files relevant to your question. Two-pass workflow.
Try freefilestoprompt.app — Free, No Sign-Up
Drop files, set a target model, get one packed prompt. Runs entirely in your browser.
Open Files to Prompt →Frequently Asked Questions
How do I know which files are highest signal?
Heuristic: small files near the top of the source tree (entry points, type contracts) usually have highest information density per token. Large files with mostly boilerplate or test data are lower density. Pin the small-and-central, drop the large-and-peripheral.
My codebase is 5M tokens — what do I do?
Use Llama 4 Scout (10M context window) directly, or pack a focused subset for Claude / Gemini (1M each). For Llama 4 Scout, freeprompttester.app routes via Groq's hosted version.
Should I include node_modules / vendor / similar?
No. These are dependencies you don't usually need to give the LLM — the LLM already knows the public APIs. If you specifically need to debug a third-party library issue, drop just the relevant package's source files in.
What about generated code (Prisma, GraphQL codegen)?
Drop it. The LLM can usually reason about your queries from your schema files alone, and generated code is large and noisy.
How do I handle proprietary code?
freefilestoprompt.app reads files locally only — they never leave your browser. Whatever you drop stays on your device. The packed prompt that you copy and paste into your LLM is the only thing that goes to the LLM provider; the tool itself sees nothing.
Can I use this for embeddings / RAG instead of direct prompts?
Yes. The output is just a text block with delimiters — useful as the input to chunkers, embedding generators, RAG pipelines, etc. Pick the plain or markdown format if your downstream tool prefers simple separators.