Snapshots

Understanding snapshots and how they optimize event replay and improve system performance.

What are Snapshots?

A snapshot is a point-in-time representation of an aggregate's state. Instead of replaying all events from the beginning of time, you can start from a snapshot and only replay events that occurred after the snapshot was taken.

Snapshots are a performance optimization technique that reduces the time and resources needed to reconstruct the current state of an entity. They're particularly valuable for aggregates with long event histories.

Example: If an Order has 1,000 events, replaying all of them every time you need the current state would be expensive. A snapshot taken at event 900 means you only need to replay the last 100 events.

Performance

Dramatically reduce the time needed to load an aggregate by starting from a recent snapshot instead of replaying thousands of events. Critical for aggregates with long histories.

Reduced Latency

Lower read latency means faster response times for your users. Commands can be processed more quickly when less time is spent reconstructing state.

Resource Efficiency

Reduce CPU and memory usage by minimizing the number of events that need to be deserialized, processed, and applied when loading aggregate state.

State Recovery

Snapshots provide faster recovery in disaster scenarios. Restore system state quickly without processing the entire event history from the beginning.

How Snapshots Work

1. Event Stream Without Snapshot

Event 1→Event 2→Event 3→...→Event 998→Event 999→Event 1000

Must replay all 1,000 events every time.

2. Event Stream With Snapshot

Event 1→...→Event 900→Snapshot→Event 901→...→Event 1000

Load snapshot, then replay only 100 events.

Snapshot Structure

A snapshot typically contains:

{
  "snapshotId": "snapshot-abc123",
  "aggregateId": "order-12345",
  "aggregateType": "Order",
  "version": 900,
  "timestamp": "2024-01-15T10:30:00Z",
  "state": {
    "orderId": "order-12345",
    "customerId": "customer-789",
    "status": "SHIPPED",
    "items": [
      {
        "productId": "product-456",
        "quantity": 2,
        "price": 29.99
      }
    ],
    "totalAmount": 59.98,
    "currency": "USD",
    "shippingAddress": { ... },
    "paymentStatus": "PAID"
  }
}

Snapshot Strategies

Event Count Threshold

Create a snapshot every N events (e.g., every 100 events). Simple and predictable.

if (eventCount % 100 === 0) createSnapshot()

Time-Based

Create snapshots at regular time intervals (e.g., daily). Good for predictable maintenance windows.

Take snapshot at midnight each day

On-Demand

Create snapshots when loading takes too long. Adaptive to actual performance needs.

if (loadTime > threshold) createSnapshot()

Hybrid

Combine multiple strategies. For example: every 100 events OR after 24 hours, whichever comes first.

if (eventCount % 100 === 0 || timeSince > 24h) createSnapshot()

Loading State with Snapshots

The process of loading aggregate state with snapshots:

Look for most recent snapshot

Query snapshot store for latest snapshot of the aggregate

Load snapshot state (if exists)

Deserialize snapshot and use it as starting state

Load events after snapshot

Query event store for events with version > snapshot version

Replay remaining events

Apply events to snapshot state to get current state

Best Practices

•
Keep events as source of truth: Snapshots are just an optimization, not a replacement
•
Store version number: Track which event version the snapshot represents
•
Make snapshots deletable: You should always be able to rebuild from events
•
Don't snapshot everything: Only create snapshots for aggregates that need them
•
Consider snapshot size: Large snapshots may not provide performance benefits
•
Handle snapshot failures gracefully: Fall back to full event replay if snapshot is corrupt
•
Version your snapshots: Handle schema changes in snapshot structure

When to Use Snapshots

Good Candidates

✓Aggregates with hundreds or thousands of events
✓Frequently accessed aggregates
✓Long-lived entities with complex state
✓Performance-critical operations

Skip Snapshots

✗Aggregates with only a few events
✗Rarely accessed aggregates
✗Short-lived entities
✗When replay is already fast enough

Trade-offs

Benefits:

• Faster aggregate loading
• Reduced CPU and memory usage
• Better scalability

Costs:

• Additional storage space
• Complexity in snapshot management
• Need to handle snapshot versioning
• Snapshot creation overhead