Make Claude Code faster and cheaper in large codebases with Ory Lumen
Ory Lumen adds local semantic code search to Claude Code via MCP. Index your codebase with local embeddings, cut API costs by up to 81%, and get better answers.


Founder & CTO
Ory Lumen adds local semantic code search to Claude Code via MCP. Index your codebase with local embeddings, cut API costs by up to 81%, and get better answers.


Founder & CTO
Our codebase keeps growing. Over the past couple of weeks, I noticed Claude Code was getting slower and more expensive to work with as a result. The issue is simple: Claude's default to use grep and find has limitations. More code means more surface area, and surface area means higher token costs and slower task completion.
When you ask Claude to find a function or understand a module, it guesses file and function names to find what is relevant. In a small codebase, this is fine. In a larger one, it becomes expensive, both in time and in API costs. This problem compounds as the codebase grows, which means it gets worse exactly when you need it to get better.
I wrote about this dynamic in more depth recently: agents struggle to build and maintain a durable mental model of a codebase. They rediscover things repeatedly through file reads instead of building on what they already know. This is a fundamental constraint of how LLMs work today, not a bug that will get patched.
Ory Lumen: Semantic search for Claude Code is a direct, practical response to that constraint.
Ory Lumen is a local semantic code search engine that runs as an MCP server alongside Claude Code. It indexes your codebase using local embedding models and exposes a semantic_search tool that Claude calls instead of reading files directly. Claude can find relevant functions, types, and modules by meaning, without opening everything to look.
How it works:
semantic_search and gets back the relevant chunks without touching the files.Everything runs on your machine. No API keys, no cloud, no external services. The embedding backend is Ollama or LM Studio. The index is stored at ~/.local/share/lumen/<hash>/index.db, keyed by project path and model name. Nothing is added to your repo.
Lumen builds a Merkle tree over file hashes on the first run. On subsequent sessions, only changed files get re-chunked and re-embedded. For large codebases, re-indexing after the first run takes seconds.
I benchmarked Lumen against the Prometheus codebase with reproducible tests:
| With Lumen | Baseline | |
|---|---|---|
| Task completion | 2.1-2.3x faster | baseline |
| API cost | 63-81% cheaper | baseline |
| Answer quality (blind judge) | 5/5 wins | 0/5 wins |
Sonnet 4.6 came in at 2.2x faster and 63% cheaper. Opus 4.6 showed the largest cost reduction: 2.1x faster and 81% cheaper. Go is the best-supported language right now, 3.8x faster and 90% cheaper, because it uses the native Go AST rather than tree-sitter. Python and TypeScript have solid numbers too and are tested across multiple embedding models.
The quality result is worth noting separately. Cheaper output that is also consistently better in blind comparisons is not a combination you see often. Full benchmark tables and reproduce instructions are in docs/BENCHMARKS.md.
We launched an Ory Claude plugin marketplace alongside Lumen today. It is the first plugin in it. Inside Claude Code, run:
/plugin marketplace add ory/claude-plugins
/plugin install lumen@ory
Lumen downloads its binary automatically from the latest GitHub release, indexes your project on the next session start, and registers the semantic_search tool. Claude picks it up without any additional configuration.
Prerequisites:
$ ollama pull ordis/jina-embeddings-v2-base-code
Two skills come with the plugin: /lumen:doctor for a health check, and /lumen:reindex to force a full re-index after a large refactor.
One constraint I was not willing to give up: your code stays on your machine. Sending source code to an external embedding API is a decision engineering teams should make deliberately, not by default. Lumen runs entirely on local hardware with open-source models. The embeddings never leave your network.
This also makes it usable in air-gapped environments, which matters for the companies running Ory's self-hosted products.
Lumen is a few days old. Go support is solid and well-benchmarked. Python and TypeScript work and have numbers, but chunking strategies for those languages can improve. The remaining languages are supported via tree-sitter but less tested.
The plugin marketplace is new. We will add more tools as we build things we find genuinely useful internally. The bar is the same one Lumen had to clear: measurable improvement, reproducible results, no cloud dependencies.
Project is at github.com/ory/lumen.