Architecture Decision Records¶
An architectural decision record (ADR) documents an important architectural choice along with its context and consequences.
Immutable history
ADRs are append-only. Once accepted, content is never changed. Superseded decisions are marked Deprecated with a cross-reference. New decisions always get the next sequential number.
Full decision text → adr.md
Decision Index¶
| # | Decision | Date | Status |
|---|---|---|---|
| ADR 1 — Build a failover lib | Build a reusable annotation-driven failover library | 10-NOV-2021 | Accepted |
| ADR 2 — @Failover Annotations | Dedicated @Failover annotation instead of reusing @FeignClient | 10-NOV-2021 | Accepted |
| ADR 3 — Metadata for referential : As Of , Up To Date ? | Referential / ReferentialAware carry upToDate and asOf | 15-NOV-2021 | Accepted |
| ADR 4 — Recovered Payload Handler | RecoveredPayloadHandler SPI for null/default handling | 15-NOV-2021 | Accepted |
| ADR 5 — Failover Store | FailoverStore abstraction with InMemory, Caffeine, JDBC impls | 16-NOV-2021 | Accepted |
| ADR 6 — Failover Execution | FailoverExecution SPI; BASIC (try/catch) and RESILIENCE variants | 17-NOV-2021 | Accepted |
| ADR 7 — Auto Cleanup | Scheduled expiry cleanup via ExpiryCleanupScheduler | 17-NOV-2021 | Accepted |
| ADR 8 — Monitoring | FailoverReporter with logger and Micrometer publishers | 17-NOV-2021 | Accepted |
| ADR 9 — Key Generator | KeyGenerator SPI; default derives key from method args | 30-DEC-2021 | Accepted |
| ADR 10 — DefaultFailoverStore — Defensive Copy for Immutability | Store clones ReferentialPayload to prevent caller mutation | 25-MAY-2026 | Accepted |
| ADR 11 — FailoverStoreBeanPostProcessor — Uniform Store Wrapping via BeanPostProcessor | BeanPostProcessor wraps stores uniformly at startup | 25-MAY-2026 | Deprecated — superseded by ADR 16, ADR 18, ADR 19 |
| ADR 12 — MethodExceptionPolicy — Pluggable Exception Handling Strategy | ExceptionPolicy enum: RETHROW, NEVER_THROW, CUSTOM | 26-MAY-2026 | Accepted |
| ADR 13 — JDBC Native Merge/Upsert — Dialect Detection and Runtime Fallback | Dialect-specific upsert with ANSI fallback | 26-MAY-2026 | Accepted |
| ADR 14 — DatabaseResolver — Strategy Interface for Database Product Detection | DatabaseResolver SPI detects DB product at runtime | 26-MAY-2026 | Accepted |
| ADR 15 — FailoverStoreQueryResolver — Single-Responsibility Co-location of All JDBC Query Concerns | All JDBC query building delegated to FailoverStoreQueryResolver | 26-MAY-2026 | Accepted |
| ADR 16 — Removal of BeanPostProcessor-based Store Wrapping (Supersedes ADR 11) | BeanPostProcessor removed; auto-config assembles store chain explicitly | 02-JUN-2026 | Accepted — supersedes ADR 11 |
| ADR 17 — TenantStoreFactory SPI — Abstracting Store Creation from Store Assembly | TenantStoreFactory decouples per-tenant store creation | 02-JUN-2026 | Accepted |
| ADR 18 — FailoverStoreAutoConfiguration — Central Assembler | Single auto-config class assembles the complete store chain | 02-JUN-2026 | Accepted |
| ADR 19 — FailoverStoreAsync — Explicit TaskExecutor Replacing @Async | AsyncFailoverStore wraps delegate with explicit executor; drops @Async | 02-JUN-2026 | Accepted |
| ADR 20 — MultiTenantFailoverStore — Outermost Per-Tenant Routing Decorator | Multi-tenant routing sits outside async decorator | 02-JUN-2026 | Accepted |
| ADR 21 — FailoverStoreMultiTenantAutoConfiguration — Multi-Tenant Auto-Configuration and TenantResolver SPI | Separate auto-config for multi-tenant; TenantResolver SPI | 02-JUN-2026 | Accepted |
| ADR 22 — FailoverKeyGenerator — UUID-Based Key Normalisation for Fixed-Width Store Keys | MD5/UUID key hash prevents VARCHAR(256) overflow | 03-JUN-2026 | Accepted |
| ADR 23 — PayloadSplitter — Scatter/Gather Storage for Composite-Key Failover | PayloadSplitter<T,R> splits collection results into per-entity store entries | 04-JUN-2026 | Accepted |
| ADR 24 — Parallel Scatter/Gather — CompletableFuture with Injected Executor | Scatter slices dispatched concurrently via injected Executor | 04-JUN-2026 | Accepted |
| ADR 25 — ContextPropagator SPI — Thread-Local Context Propagation for Parallel Scatter | ContextPropagator captures and restores thread-local context on executor threads | 04-JUN-2026 | Accepted |
ADR 26 — Replace LocalDateTime with Instant for Timezone-Aware Expiry Timestamps | Instant eliminates timezone ambiguity in expiry across multi-node/multi-timezone deployments | 06-JUN-2026 | Accepted |
ADR 27 — Migrate Deprecated JdbcTemplate Overloads in FailoverStoreJdbc | Varargs overloads replace deprecated Object[] + int[] forms; removes java.sql.Types usage | 06-JUN-2026 | Accepted |
ADR 28 — domain Attribute — Shared Store Partitioning Across @Failover Annotations | domain enables scatter/gather slices and single-entity endpoints to share a store partition | 07-JUN-2026 | Accepted |
| ADR 29 — Observability Layer — Observer, Publisher SPI and MDC Logger Refactor | Rename reporter stack to observer; MDC-safe publish via ObservablePublisher SPI; composite publisher | 07-JUN-2026 | Accepted |
| ADR 30 — SpringContextFailoverScanner — Replacing Reflections-Based Classpath Scanning | Spring bean enumeration replaces Reflections; removes package-to-scan config and Guava dep | 07-JUN-2026 | Accepted |
| ADR 31 — failover-observable-micrometer — Micrometer Extension as an Optional Module | Micrometer meters and Actuator health indicator extracted to an optional opt-in module | 07-JUN-2026 | Accepted |
| ADR 32 — PayloadSplitterExecutionException — Wrapping User-Splitter Failures with Diagnostic Context | All PayloadSplitter call failures wrapped in PayloadSplitterExecutionException with splitter name and operation context | 10-JUN-2026 | Accepted |
| ADR 33 — doRecoverAll All-Slices Iteration — User-Controlled Slice Count | doRecoverAll iterates over all slices returned by splitOnRecover; slice count is user-controlled via PayloadSplitter | 10-JUN-2026 | Accepted |
| ADR 34 — ScatterGatherFailoverHandler.recoverAll() Override — Clear Error for Scatter Case | ScatterGatherFailoverHandler.recoverAll() overrides default with UnsupportedOperationException to prevent silent wrong-path execution | 10-JUN-2026 | Accepted |
| ADR 35 — Empty splitOnRecover Guard — Null Return Instead of merge([]) | Guard against empty splitOnRecover result returns null rather than merging an empty list | 10-JUN-2026 | Accepted |
| ADR 36 — splitOnRecover RecoverAll Contract — Single Placeholder for DefaultFailoverHandler | splitOnRecover must return exactly one placeholder context when delegating to DefaultFailoverHandler.recoverAll | 10-JUN-2026 | Accepted |
| ADR 37 — Payload Deserialization Allowlist — Secure-by-Default Class Loading | JsonSerializer.toClass restricted to an allowlist auto-derived from @Failover payload packages plus failover.store.jdbc.allowed-payload-classes | 14-JUN-2026 | Accepted |
| ADR 38 — Scatter/Gather Per-Slice Timeout — Bounded Parallel Join | failover.scatter.timeout bounds parallel slice joins; timed-out recover slice = not recovered, store slice surfaces | 14-JUN-2026 | Accepted |
| ADR 39 — Error Propagation — Never Recover on a Failing JVM | Error rethrown unwrapped by the aspect; recovery never runs on a dying JVM | 14-JUN-2026 | Accepted |
| ADR 40 — Multi-Tenant Strict Mode — Reject Unconfigured Tenants | failover.store.multitenant.strict rejects (or WARNs on) tenants absent from the configured map | 14-JUN-2026 | Accepted |
| ADR 41 — Async Store Failure Metric — Visibility for a Silently-Degraded Layer | FailoverStoreAsync publishes failover.store.async.failed on executor-side failures | 14-JUN-2026 | Accepted |
| ADR 42 — FailoverScanner Relocation to a Neutral Core Package | FailoverScanner SPI moved core.observable.scanner → core.scanner; shared by observability and store security | 14-JUN-2026 | Accepted |
| ADR 43 — Dialect Integration Tests via Testcontainers | Real PostgreSQL/MySQL/MariaDB merge ITs, profile-gated (dialect-its) and excluded from the default build; Oracle stays string-asserted | 15-JUN-2026 | Accepted |
| ADR 44 — Concurrency Test Coverage for Multi-Tenant Routing and the Async Store | Contention tests for computeIfAbsent one-store-per-tenant and the FailoverStoreAsync executor path | 15-JUN-2026 | Accepted |
| ADR 45 — ArchUnit Architecture Tests | Enforce no-ThreadLocal-in-async, *Store naming, and acyclic slices; split-package rule deferred to Phase 4 | 15-JUN-2026 | Accepted |
| ADR 46 — PIT Mutation Testing on Expiry and Key Logic | Profile-gated (mutation) PIT over all of failover-core; mandated 95% gate (blocking), currently 96% / 99% test strength | 15-JUN-2026 | Accepted |
| ADR 47 — JDBC insert→update Race — Bounded Retry over Silent Drop | INSERT/UPDATE fallback re-INSERTs once when a concurrent expiry delete drops the UPDATE; bounded to 2 attempts, abandons at warn | 15-JUN-2026 | Accepted |
| ADR 48 — Failover Lifecycle Logging — INFO Event, DEBUG Payload Body | Store/recover lifecycle stays at INFO (name only); full ReferentialPayload body moved to DEBUG | 15-JUN-2026 | Accepted |
| ADR 49 — ScatterGatherFailoverHandler — Extract Scatter/Gather Collaborators | Thin facade + PayloadScatter / PayloadGather / SliceDispatcher / SplitterInvoker; public API and behaviour unchanged | 15-JUN-2026 | Accepted |
| ADR 50 — Metrics Builder Helper — Cheaper Metric Construction on the Recover Path | Metrics concatenates keys (no String.format) + typed collect overloads; ~3.6× faster recover-bag build (JMH 744→204 ns/op) | 15-JUN-2026 | Accepted |
| ADR 51 — Per-Method Failover Outcome Metric + Method-Identity Threading | failover.recovery.outcome.total{name,domain,method,outcome} for failover/recovery/non-recovery rates | 15-JUN-2026 | Accepted — threading mechanism refined by ADR 52 |
| ADR 52 — FailoverHandler Method-Aware Contract + AbstractFailoverHandler | Single method-aware FailoverHandler contract (@NonNull Method); AbstractFailoverHandler bridges method-agnostic handlers; method threaded through scatter to slices. Breaking SPI | 15-JUN-2026 | Accepted |
| ADR 53 — Overall JaCoCo Coverage Gate | Cross-module jacoco:check in the failover-test-report module (unpack classes + merge all exec) fails verify below 95% line / 95% branch | 15-JUN-2026 | Accepted |
| ADR 54 — FailoverStore Assembly — Collapse Four Property-Gated Beans into One | Single failoverStore bean replaces the 4 async × multitenant `` variants; per-tenant-async made explicit; refines ADR 18 | 16-JUN-2026 | Accepted |
ADR 55 — Embedded Failover Dashboard (failover-dashboard) | Opt-in, secure-by-default embedded UI + read-only JSON API over the existing scanner config and failover.* meters; separate starter, no new instrumentation, fail-closed access gate | 17-JUN-2026 | Accepted |
| ADR 56 — Payload-at-rest Encryption for the JDBC Store | PayloadCipher SPI + EncryptingSerializer over the JDBC store; ENC(<cipherId>:…) envelope, write-switch only, failover.store.jdbc.encryption.*; opt-in, b64 default marked non-secure | 17-JUN-2026 | Accepted |
| ADR 57 — Async Executor Back-pressure | BoundedTaskExecutor (semaphore admission guard, keeps virtual threads) makes the async-store and scatter executors optionally bounded; concurrency-limit + rejection-policy (DISCARD default), opt-in unbounded by default (audit R-2) | 17-JUN-2026 | Accepted |
| ADR 58 — Non-Durable Store in Production — Advisory Warning over Fail-Fast | Non-durable store WARN now names jdbc as the recommended production store + store decision/topology docs; rejected profile-aware fail-fast as brittle (audit A1) | 22-JUN-2026 | Accepted |
| ADR 59 — Async Store Submit-Time Rejection — Count Saturation, Don't Drop It Silently | Submit-time executor rejection (ABORT/shutdown) now emits the existing failover.store.async.failed meter; closes the saturation blind spot, no new meter/tag (audit A2) | 22-JUN-2026 | Accepted |
| ADR 60 — Deserialization Allowlist Strict Mode — Fail-Closed on an Empty Allowlist | Opt-in failover.store.jdbc.strict-allowlist denies all deserialization on an empty allowlist (was fail-open allow-all); secure default unchanged (audit A3) | 22-JUN-2026 | Accepted |
| ADR 61 — Built-in AES-GCM Payload Cipher — Usable Encryption-at-Rest Out of the Box | Built-in AesGcmPayloadCipher (id aesgcm) auto-registered from failover.store.jdbc.encryption.key; real encryption-at-rest with no consumer crypto code; extends ADR 56 (audit A4) | 22-JUN-2026 | Accepted |
| ADR 62 — Opt-in JDBC Live-Entries Gauge — Capacity Visibility Without a COUNT(*) Tax | FailoverStoreJdbc is FailoverStoreSizeAware; failover.store.jdbc.live-entries-gauge-enabled (default off) exposes failover.live.entries via opt-in COUNT(*) per scrape for capacity monitoring (audit A7) | 25-JUN-2026 | Accepted |
| ADR 63 — Startup Warning for Un-advisable @Failover Placement | Scanner WARNs at startup when a discovered @Failover can't be advised (interface-only, non-public/static/final method, final class); self-invocation documented (audit A8) | 25-JUN-2026 | Accepted |
| ADR 64 — Startup Warning for recoverAll Without a payloadSplitter | Scanner WARNs when @Failover(recoverAll=true) has no payloadSplitter (silently falls back to single-key recover); adoption guidance documented (audit A10) | 25-JUN-2026 | Accepted |
| ADR 65 — Event-Driven Snapshot Publishing with Throttle and Backoff | Cluster snapshot pushes fire on metric events (throttled to one per interval-seconds, WARN-once backoff on failure) via SnapshotPublisher/SnapshotPushClient; no polling scheduler | 28-JUN-2026 | Accepted |
| ADR 66 — Decoupled Heartbeat Liveness Tracking | Lightweight opt-in heartbeat ({"instanceId"} POST) drives LiveStatus LIVE/DOWN/UNKNOWN, decoupled from snapshot freshness; DOWN instances keep contributing last-known metrics | 28-JUN-2026 | Accepted |
| ADR 67 — Reset-Aware Shared-Store Aggregate and Bounded Instance Retirement | SnapshotBaseline carry-forward makes the instant cluster aggregate monotonic across peer restarts (counter resets); unseen instances retire after instance-retention into a bounded tombstone aggregate — counts never drop, heap stays bounded under pod churn | 03-JUL-2026 | Accepted |