Veltrix migrates event engine to Rust, achieves sub-50ms p99 latency
This review examines Veltrix's technical case study on migrating its high-throughput event processing engine from Scala/Kafka Streams to Rust, focusing on the architectural decisions and reported…
This review examines Veltrix's technical case study on migrating its high-throughput event processing engine from Scala/Kafka Streams to Rust, focusing on the architectural decisions and reported performance gains.
The Answer Up Front
For engineering teams grappling with unpredictable latency and resource contention in high-throughput event processing systems built on JVM technologies, Veltrix's migration to Rust offers a compelling blueprint. This approach is for those who prioritize predictable, low-tail latency and are willing to invest in Rust's steeper learning curve for critical hot paths. Teams not facing sub-100ms p99 latency requirements or those without the engineering bandwidth for a language shift should skip this path. The bottom line is that for extreme performance and stability in event processing, Rust delivered where a highly tuned Scala/Kafka Streams setup struggled with GC pauses and JIT compilation overheads.
Methodology
This v0 review draws exclusively on the technical case study published by "Built From Africa" on dev.to, titled "The Day We Realized Events Were the Bottleneck (And Why We Moved to Rust)". Independent benchmarks are pending. We will re-test when claims diverge from observed behavior in future versions. The review covers Veltrix's internal event processing engine, specifically its architectural evolution and the reported performance metrics post-migration to Rust. The technologies central to this review are Rust (version not explicitly stated, but implies current stable), Tokio for asynchronous runtime, and sled for embedded key-value storage. This review does not cover independent performance verification, long-term operational workflows beyond the reported Black Friday period, or edge cases not detailed in the source material. All performance numbers and observations are attributed to the founder's published claims.
What It Does
Veltrix operates a distributed event processing engine designed to power real-time treasure hunts in retail environments. The core business requirement was sub-50ms latency for event ingestion and 99.99% uptime, particularly during high-stakes events like Black Friday sales.
The original challenge
Initially, Veltrix's system was implemented as a Kafka Streams topology in Scala, leveraging RocksDB state stores. Despite extensive tuning, including a 16 GiB JVM heap, G1GC configuration with -XX:MaxGCPauseMillis=50, and 32 vCPUs per pod, the system failed to meet its latency targets. During a load test at 500,000 events per second, the p99 latency spiked to 1.2 seconds, and the JVM experienced OutOfMemory errors twice. Attempts to scale out to six pods introduced a 300 ms tail in the repartition topic's shuffle phase. Switching to exactly-once semantics and increasing the RocksDB cache to 4 GiB led to disk I/O bottlenecks, with fsync operations pegging disks at 100% iowait. Profiling with async-profiler revealed significant time spent in JIT compilation stalls (42%) and GC pauses (28%), with GC logs indicating large memory promotions leading to crashes.
Rust architecture
The team made a strategic decision to port the critical hot path of the event processing engine to Rust. This included the event router, windowed aggregator, and leaderboard updater, rewritten in 2,800 lines of Rust code. The choice was driven by the need for predictable latency and the elimination of hidden GC pauses. The Rust implementation utilizes Tokio for its asynchronous runtime and sled as an embedded key-value store. The sled store was configured to run primarily in-memory, with disk flushes occurring every 500 ms to mitigate the fsync-related I/O issues encountered with RocksDB. The existing Scala layer was retained for schema validation and REST endpoints, offloading the high-performance event processing to the new Rust components.
Performance gains
Following the migration, Veltrix re-ran the 500,000 events per second load test. The reported p99 latency decreased from 1.2 seconds to 38 ms, with p99.9 latency at 72 ms. The sled store's peak memory allocation was 2.1 GiB. The Rust compiler's LLVM backend reportedly emitted SIMD instructions, halving CPU time on the join operations. flamegraph profiling showed only 0.3% of time spent in GC, with the remainder attributed to network operations and sled compaction. During Black Friday, the Rust pods operated at 65% CPU utilization, experiencing zero OutOfMemory errors and zero restarts, ensuring continuous availability of the treasure hunt UI.
What's Interesting / What's Not
The Veltrix case study is interesting for its clear articulation of the why behind a significant technology migration. The explicit goal was predictable latency, not just raw speed, which is a crucial distinction for real-time systems. The detailed profiling with async-profiler and flamegraph provides concrete evidence of JVM-specific bottlenecks—JIT compilation stalls, GC pauses, and the overheads of distributed Kafka Streams operations like repartition topic shuffles. This level of diagnostic detail is often missing from migration stories. The decision to move to Rust, specifically Tokio and sled, directly addressed these issues, yielding a dramatic improvement in p99 latency from 1.2 seconds to 38 ms.
What's also notable is the post-migration reflection on further optimizations. The founder's stated intent to move from sled to a custom sharded in-memory hash table with jemalloc highlights the continuous pursuit of microsecond-level determinism, acknowledging sled's compaction-related latency spikes. The recognition that Kubernetes cgroups introduced 3-5 ms of scheduling jitter, and the desire to profile with perf on bare metal, underscores a deep commitment to squeezing out every millisecond of performance. The mention of Rust 1.75's new allocator API for swapping jemalloc with mimalloc without recompilation points to sophisticated performance engineering practices.
What's less interesting, or rather, what's not present, is any discussion of Veltrix as a commercial product. The blog post functions purely as a technical case study of an internal system, not a product review. The inclusion of an unrelated affiliate link at the end of the article for "non-custodial payment rails" is a distraction and provides no information relevant to the Veltrix system itself. The learning curve for Rust, particularly around lifetimes in complex components like the windowed aggregator, is acknowledged but not elaborated upon in terms of specific challenges or mitigation strategies beyond the time investment.
Pricing
Veltrix is an internal event processing engine developed by the "Built From Africa" team; it is not offered as a commercial product or service. Therefore, no pricing tiers or free-tier limits apply. (Pricing snapshot date: 2026-05-27)
Verdict
For organizations building high-throughput, low-latency event processing systems where predictable performance is paramount, Veltrix's migration to Rust demonstrates a clear path forward. The significant reduction in p99 latency from 1.2 seconds to 38 ms, coupled with enhanced stability and resource efficiency, makes a strong case for Rust over JVM-based solutions when facing GC pauses, JIT overheads, and I/O contention. While the learning curve for Rust is acknowledged as steep, the stability and performance gains achieved by Veltrix justify the engineering investment for critical hot paths. This approach is particularly effective if existing JVM systems are bottlenecked by runtime characteristics rather than fundamental algorithmic limitations.
What We'd Test Next
Our next steps would involve independently reproducing the reported performance benchmarks. We would compare the Rust-based Veltrix architecture against other high-performance event processing frameworks, including C++ solutions and Go-based systems, under similar load conditions to validate the claims. A deeper investigation into sled's compaction behavior and its impact on tail latency would be crucial, alongside benchmarking the proposed custom sharded in-memory hash table with various allocators like jemalloc and mimalloc. We would also quantify the scheduling jitter introduced by Kubernetes cgroups and assess the performance gains from bare-metal profiling and -C target-cpu=native compilation flags.
The investor read
The Veltrix case study signals a continued trend of high-performance, low-latency systems moving away from JVM-based runtimes towards languages like Rust. This shift is driven by the need for predictable tail latencies and lower operational costs associated with reduced memory footprint and CPU usage. Investors should note the growing maturity of Rust's async ecosystem (Tokio) and embedded KV stores (sled), making it a viable choice for critical infrastructure. Companies that can demonstrate similar dramatic performance improvements through Rust migrations, especially in data-intensive or real-time sectors, will attract significant attention. This also highlights a potential investment opportunity in specialized Rust tooling, profiling, and consulting services for enterprises undertaking such complex migrations, as the learning curve remains a barrier for many teams. The explicit focus on solving specific, measurable bottlenecks, rather than a generic rewrite, is a key indicator of a well-executed technical strategy.
Every claim ties to a primary source. See our methodology.