HomeReadTools deskAPScheduler's Leader Election Failed: A Robust Table-Based Lease Fix
Tools·Jun 13, 2026

APScheduler's Leader Election Failed: A Robust Table-Based Lease Fix

This review details a critical failure of APScheduler's leader election using PostgreSQL advisory locks on a solo VM, analyzing the technical root cause and a robust, table-based lease solution. The…

This review details a critical failure of APScheduler's leader election using PostgreSQL advisory locks on a solo VM, analyzing the technical root cause and a robust, table-based lease solution.

The Answer Up Front

For developers relying on APScheduler for critical background tasks, particularly those implementing leader election with PostgreSQL advisory locks, this review highlights a significant pitfall. The founder's experience demonstrates that pg_try_advisory_lock can fail silently and permanently in specific environments, leading to complete scheduler outages. We recommend abandoning session-based advisory locks for leader election in favor of a durable, table-based lease mechanism, as detailed in the proposed solution. This approach provides explicit state management and recovery, essential for reliable distributed scheduling.

Methodology

This v0 review draws on the founder's published claims at dev.to, accessed on June 5, 2026. Independent benchmarks are pending. Update cadence: re-tested when claims diverge from observed behavior. This review covers the founder's detailed account of an APScheduler leader election failure, the identified root cause, and the proposed technical solution involving a database table-based lease. Specifically, we analyze the technical details of the pg_try_advisory_lock failure and the logic of the UPDATE scheduler_leader SQL statement provided as a fix. This review does not cover independent performance benchmarks of the proposed solution, long-term workflow implications, or edge cases beyond those described by the founder. We also do not evaluate APScheduler's broader feature set or other leader election strategies.

What It Does

Advisory lock failure

The founder, operating a content engine on a single small VM, experienced a complete and permanent failure of their APScheduler setup. The system, designed to ensure only one worker instance ran scheduled jobs, utilized PostgreSQL's pg_try_advisory_lock for leader election. The intended mechanism was for the successful lock acquirer to become the leader, with others standing down. However, in an environment with a direct PostgreSQL connection and asyncpg for dedicated connections, the lock was reportedly acquired but immediately released. The application logic incorrectly registered active=True, while pg_locks showed no active holders, leading to a broken singleton pattern and, at times, double execution of jobs.

Session state unreliability

The core issue identified was the unreliability of PostgreSQL session state when using advisory locks in this specific setup. The founder reports that advisory locks, being session-level, were implicitly released when the underlying connection handling (even without a pooler like pgbouncer) caused session state loss or reset. The application's logic, unaware of this implicit release, continued to operate under the false assumption of holding the lock. This discrepancy between application state and actual database lock status was the smoking gun for the permanent scheduler failure, preventing self-healing mechanisms from recovering.

A robust lease solution

To address the unreliability, the founder implemented a new leader election strategy based on a simple database table, scheduler_leader, utilizing a time-based lease. This approach explicitly manages leadership state, decoupling it from volatile session states. All workers start their scheduler in a paused state. A designated leader worker then resumes the scheduler. Leader election occurs every 25 seconds via a DML statement: UPDATE scheduler_leader SET holder=$me, heartbeat=now() WHERE id=1 AND (holder=$me OR holder IS NULL OR heartbeat < now()-75s) RETURNING holder. This transaction ensures only one worker can claim leadership by updating the record, with the heartbeat column acting as a lease. If a leader fails, its lease expires after 75 seconds, allowing another worker to acquire it and become the new leader.

What's Interesting / What's Not

The most interesting aspect of this signal is the detailed, reproducible technical breakdown of a common distributed systems pitfall. The founder explicitly identifies session state unreliability as the root cause for pg_try_advisory_lock failing in a single-VM context, which is a nuanced but critical distinction. Many developers might assume advisory locks are robust enough for simple leader election, especially on a solo instance, only to encounter subtle, hard-to-debug failures like this. The provided SQL statement for the table-based lease is a concrete, actionable pattern that can be directly applied. It demonstrates a pragmatic shift from implicit, session-dependent state to explicit, durable database state for critical coordination. What's less interesting, or rather, what's not covered, is a broader comparison of this solution against other established leader election patterns (e.g., ZooKeeper, etcd, Raft implementations) or APScheduler's own built-in distributed capabilities. The focus is tightly on a specific problem and its direct, database-centric solution.

Pricing

Not applicable; this review covers a technical pattern and solution for APScheduler, an open-source library, not a commercial tool with a pricing model. The solution involves standard PostgreSQL features.

Verdict

For any developer using APScheduler or similar job schedulers and considering PostgreSQL advisory locks for leader election, the founder's experience serves as a stark warning. Relying on session-level state for critical coordination, particularly in environments with potentially transient connections, is a fragile design choice. We strongly recommend adopting the table-based lease pattern described. This approach, with its explicit heartbeat and lease expiration, provides a significantly more robust and observable leader election mechanism, preventing the silent, permanent scheduler failures reported here. It trades the perceived simplicity of advisory locks for the verifiable reliability of durable database state.

What We'd Test Next

Our next steps would involve independently implementing and benchmarking the proposed table-based lease solution. We would test its performance under various load conditions, including high contention for leadership, and measure the failover time when a leader process is abruptly terminated. We would also explore its behavior with different PostgreSQL connection pooling strategies (e.g., pgbouncer) and evaluate its resilience to network partitions or temporary database unavailability. A comparison against other robust leader election libraries or services would also be valuable to understand the trade-offs in complexity, performance, and operational overhead.

The investor read

This signal points to the enduring challenge of building reliable distributed systems, even for seemingly simple tasks like leader election on a single VM. The market for robust, off-the-shelf scheduling and coordination primitives remains strong, as evidenced by the continued relevance of tools like ZooKeeper, etcd, and Consul. While this specific solution is a technical pattern, it highlights the demand for tooling that abstracts away the complexities of database-level coordination. An investable company in this space would offer a highly reliable, easy-to-integrate, and observable distributed coordination service, potentially as a managed offering or a well-packaged library with strong guarantees. The founder's detailed technical breakdown also underscores the value of deep engineering insights in a market often saturated with high-level solutions. This is a deliberate small/bootstrapped play, focusing on solving a specific, painful engineering problem with a pragmatic, open-source-friendly approach.

Sources · how we verified
  1. APScheduler's Advisory Lock Failure: My Solo VM's Scheduler Died Permanently

Every claim ties to a primary source. See our methodology.

Reported by the Riley desk on Founderr Pulse’s Tools beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.
R
Riley

The Riley desk covers tools — what founders are building with, switching to, and abandoning. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Founderr Pulse — free & independent. The desk for people who build & back.