Spanlens offers open-source LLM observability with in-proxy interventions
An open-source (MIT) alternative to Langfuse and LangSmith, Spanlens combines standard tracing and cost analysis with active features like PII scanning, injection blocking, and A/B testing at the…
An open-source (MIT) alternative to Langfuse and LangSmith, Spanlens combines standard tracing and cost analysis with active features like PII scanning, injection blocking, and A/B testing at the proxy.
The Answer Up Front
Spanlens is for engineering teams building LLM applications who want a self-hostable, open-source observability solution with more active, intervention-style features than a simple logger. Its combination of tracing, evaluation, and in-proxy security scanning is compelling. Teams already deeply integrated with a mature, managed platform like LangSmith or those who cannot tolerate any potential latency from a proxy architecture should pause. The bottom line: Spanlens presents a comprehensive open-source toolkit, but its value depends entirely on the performance and reliability of its proxy-based features, which remain un-benchmarked.
Methodology
This is a v0 review of Spanlens, based entirely on the project's public announcement. The analysis draws from the features and architecture described by the author, Haeseong Jeon, in a blog post on dev.to, published in July 2026. This review covers the tool's claimed capabilities, including its integration method, observability features, active intervention mechanisms, and evaluation frameworks. It does not include independent performance benchmarks. We have not measured the latency overhead of the Spanlens proxy, the accuracy of its PII and injection scanners, or the resource requirements of a self-hosted instance. All features and performance characteristics described here are based on the founder's claims. An updated review will follow once we can conduct independent, hands-on testing.
What It Does
Spanlens is presented as an open-source LLM observability platform that operates as a proxy between an application and various LLM providers.
One-line proxy integration
Integration is designed to be minimal. Developers can either change the baseURL in their existing LLM client (OpenAI, Anthropic, etc.) to point to their Spanlens proxy instance, or they can use a CLI wizard (npx @spanlens/cli init) which claims to automatically rewrite the necessary code. Once configured, Spanlens intercepts all LLM API calls. It supports a wide range of providers, including OpenAI, Anthropic, Gemini, Mistral, OpenRouter, Azure OpenAI, and local Ollama models. The system reports it can automatically reconstruct streaming responses for accurate logging.
Core observability and tracing
The platform provides a dashboard for analyzing logged requests. Key features include cost tracking, which breaks down spend by model, user, or individual request, and specifically accounts for prompt-caching to show actual savings. For complex, multi-step workflows, an agent tracing feature visualizes the call chain as a Gantt chart or a node graph, identifying the critical path to debug latency. The system also includes anomaly detection, which flags significant deviations in latency, cost, or error rates against a seven-day baseline and provides alerts via Email, Slack, or Discord.
Active interventions and evaluation
Beyond passive monitoring, Spanlens claims to offer active features at the proxy layer. It includes a regex-based scanner for PII and prompt-injection attempts, with the ability to block malicious requests before they reach the LLM. A 'savings engine' suggests when a task sent to a powerful model like GPT-4o could be handled by a cheaper alternative. For continuous improvement, the platform supports prompt versioning with A/B testing, using Welch's t-test to determine statistical significance. It also has a built-in LLM-as-judge framework for scoring response quality against a rubric, using models from OpenAI, Anthropic, or Gemini as the judge.
What's Interesting / What's Not
The most interesting aspect of Spanlens is its explicit strategy of combining passive observability with active, in-proxy interventions. While competitors like Langfuse and LangSmith have world-class tracing and logging, many of their evaluation and security features operate after the fact or require separate SDK integration. Spanlens's proxy architecture centralizes this functionality. The ability to block prompt injections or scan for PII before the data hits a third-party model is a meaningful security and compliance feature. The automated code-rewriting wizard is also a clever onboarding mechanic that reduces initial friction.
The primary unanswered question is performance. A proxy is a potential bottleneck and a single point of failure. The source material makes no claims about the latency overhead introduced by Spanlens. How much does PII scanning, logging, and JSON reconstruction add to a streaming p50 or p95 response time? Without these numbers, the operational cost is unknown. Furthermore, while regex-based scanning for PII and prompt injections is a decent first line of defense, it is notoriously brittle. Sophisticated attacks or complex data formats will likely bypass it. The value of these features depends on their real-world accuracy, which is not documented.
Pricing
Spanlens is available under an MIT open-source license. As of July 2026, the source material does not detail any paid, hosted, or enterprise offerings.
Verdict
For teams committed to an open-source, self-hosted stack, Spanlens is a strong new contender in LLM observability. It bundles the core features of established players (tracing, cost analysis) with a suite of next-generation active tools (proxy-based security, A/B testing, LLM-as-judge) in a single package. Its adoption should be considered by teams who want to move beyond just logging and start actively managing and evaluating their LLM traffic in-flight. However, this is a v0 assessment. The project's viability hinges on the proxy's performance and the true efficacy of its scanners. Without independent benchmarks, adopting Spanlens for a production workload is a bet on the project's engineering quality.
What We'd Test Next
A v2 review would require hands-on benchmarking. First, we would measure the end-to-end latency overhead of the Spanlens proxy for both streaming and non-streaming requests across multiple providers (e.g., OpenAI, Anthropic, and a local Ollama model). Second, we would evaluate the false positive and false negative rates of the PII and prompt-injection scanners using a standardized dataset like AdvBench. Third, we would measure the CPU and memory consumption of a self-hosted Spanlens instance under a sustained load of 100 requests per second to understand its operational resource footprint. Finally, we would verify the correctness of its cost-tracking calculations, especially its accounting for prompt-caching.
The investor read
The LLM observability market is crowded with well-funded players (LangSmith, Helicone, Arize), making any new entry a tough sell. Spanlens's strategy follows the classic open-source wedge: provide a comprehensive, self-hostable alternative to capture developers and teams resistant to vendor lock-in. Its key differentiator is moving up the value chain from passive observability to active, in-proxy intervention (security, routing, A/B testing). This is a higher-value, stickier position if executed well. For Spanlens to become investable, it must demonstrate significant open-source traction (stars, contributors, active community) and articulate a clear commercialization path for a managed cloud or enterprise offering. The primary risk is that incumbents can replicate proxy features, turning it into a feature race. Watch for community adoption as the leading indicator of a defensible moat.
Pull quote: “The ability to block prompt injections or scan for PII before the data hits a third-party model is a meaningful security and compliance feature.”
Every claim ties to a primary source. See our methodology.