Synchronization Architecture
This document describes the internal architecture of Eidetica's synchronization system, including design decisions, data structures, and implementation details.
Architecture Overview
The synchronization system uses a BackgroundSync architecture with command-pattern communication:
- Single background thread handling all sync operations
- Command channel communication between frontend and background
- Merkle-CRDT synchronization for conflict-free replication
- Modular transport layer supporting HTTP and Iroh P2P protocols
- Hook-based change detection for automatic sync triggering
- Persistent state tracking in sync database using DocStore
graph TB
subgraph "Application Layer"
APP[Application Code] --> TREE[Database Operations]
end
subgraph "Core Database Layer"
TREE --> ATOMICOP[Transaction]
BASEDB[Instance] --> TREE
BASEDB --> SYNC[Sync Module]
ATOMICOP --> COMMIT[Commit Operation]
COMMIT --> HOOKS[Execute Sync Hooks]
end
subgraph "Sync Frontend"
SYNC[Sync Module] --> CMDTX[Command Channel]
SYNC --> PEERMGR[PeerManager]
SYNC --> SYNCTREE[Sync Database]
HOOKS --> SYNCHOOK[SyncHookImpl]
SYNCHOOK --> CMDTX
end
subgraph "BackgroundSync Engine"
CMDTX --> BGSYNC[BackgroundSync Thread]
BGSYNC --> TRANSPORT[Transport Layer]
BGSYNC --> RETRY[Retry Queue]
BGSYNC --> TIMERS[Periodic Timers]
BGSYNC -.->|reads| SYNCTREE[Sync Database]
BGSYNC -.->|reads| PEERMGR[PeerManager]
end
subgraph "Sync State Management"
SYNCSTATE[SyncStateManager]
SYNCCURSOR[SyncCursor]
SYNCMETA[SyncMetadata]
SYNCHISTORY[SyncHistoryEntry]
SYNCSTATE --> SYNCCURSOR
SYNCSTATE --> SYNCMETA
SYNCSTATE --> SYNCHISTORY
BGSYNC --> SYNCSTATE
end
subgraph "Storage Layer"
BACKEND[(Backend Storage)]
SYNCTREE --> BACKEND
SYNCSTATE --> SYNCTREE
end
subgraph "Transport Layer"
TRANSPORT --> HTTP[HTTP Transport]
TRANSPORT --> IROH[Iroh P2P Transport]
HTTP --> NETWORK1[Network/HTTP]
IROH --> NETWORK2[Network/QUIC]
end
Core Components
1. Sync Module (sync/mod.rs
)
The main Sync
struct is now a thin frontend that communicates with a background sync engine:
pub struct Sync {
/// Communication channel to the background sync engine
command_tx: mpsc::Sender<SyncCommand>,
/// The backend for read operations and database management
backend: Arc<dyn Database>,
/// The database containing synchronization settings
sync_tree: Database,
/// Track if transport has been enabled
transport_enabled: bool,
}
Key responsibilities:
- Provides public API methods
- Sends commands to background thread
- Manages sync database for peer/relationship storage
- Creates hooks that send commands to background
2. BackgroundSync Engine (sync/background.rs
)
The BackgroundSync
struct handles all sync operations in a single background thread and accesses peer state directly from the sync database:
pub struct BackgroundSync {
// Core components
transport: Box<dyn SyncTransport>,
backend: Arc<dyn Database>,
// Reference to sync database for peer/relationship management
sync_tree_id: ID,
// Server state
server_address: Option<String>,
// Retry queue for failed sends
retry_queue: Vec<RetryEntry>,
// Communication
command_rx: mpsc::Receiver<SyncCommand>,
}
BackgroundSync accesses peer and relationship data directly from the sync database:
- All peer data is stored persistently in the sync database via
PeerManager
- Peer information is read on-demand when needed for sync operations
- Peer data automatically survives application restarts
- Single source of truth eliminates state synchronization issues
Command types:
pub enum SyncCommand {
// Entry operations
SendEntries { peer: String, entries: Vec<Entry> },
QueueEntry { peer: String, entry_id: ID, tree_id: ID },
// Sync control
SyncWithPeer { peer: String },
Shutdown,
// Server operations (with response channels)
StartServer { addr: String, response: oneshot::Sender<Result<()>> },
StopServer { response: oneshot::Sender<Result<()>> },
GetServerAddress { response: oneshot::Sender<Result<String>> },
// Peer connection operations
ConnectToPeer { address: Address, response: oneshot::Sender<Result<String>> },
SendRequest { address: Address, request: SyncRequest, response: oneshot::Sender<Result<SyncResponse>> },
}
Event loop architecture:
The BackgroundSync engine runs a tokio select loop that handles:
- Command processing: Immediate handling of frontend commands
- Periodic sync: Every 5 minutes, sync with all registered peers
- Retry processing: Every 30 seconds, attempt to resend failed entries
- Connection checks: Every 60 seconds, verify peer connectivity
All operations are non-blocking and handled concurrently within the single background thread.
Server initialization:
When starting a server, BackgroundSync creates a SyncHandlerImpl
with database access:
// Inside handle_start_server()
let handler = Arc::new(SyncHandlerImpl::new(
self.backend.clone(),
DEVICE_KEY_NAME,
));
self.transport.start_server(addr, handler).await?;
This enables the transport layer to process incoming sync requests and store received entries.
3. Command Pattern Architecture
The command pattern provides clean separation between the frontend and background sync engine:
Command categories:
- Entry operations:
SendEntries
,QueueEntry
- Handle network I/O for entry transmission - Server management:
StartServer
,StopServer
,GetServerAddress
- Manage transport server state - Network operations:
ConnectToPeer
,SendRequest
- Perform async network operations - Control:
SyncWithPeer
,Shutdown
- Coordinate background sync operations
Data access pattern:
- Peer and relationship data: Written directly to sync database by frontend, read on-demand by background
- Network operations: Handled via commands to maintain async boundaries
- Transport state: Owned and managed by background sync engine
This architecture:
- Eliminates circular dependencies: Clear ownership boundaries
- Maintains async separation: Network operations stay in background thread
- Enables direct data access: Both components access sync database directly for peer data
- Provides clean shutdown: Graceful handling in both async and sync contexts
4. Change Detection Hooks (sync/hooks.rs
)
The hook system automatically detects when entries need synchronization:
pub trait SyncHook: Send + Sync {
fn on_entry_committed(&self, context: &SyncHookContext) -> Result<()>;
}
pub struct SyncHookContext {
pub tree_id: ID,
pub entry: Entry,
pub is_root_entry: bool,
}
Integration flow:
- Transaction detects entry commit
- Executes registered sync hooks with entry context
- SyncHookImpl creates QueueEntry command
- Command sent to BackgroundSync via channel
- Background thread fetches and sends entry immediately
The hook implementation is per-peer, allowing targeted synchronization. Commands are fire-and-forget to avoid blocking the commit operation.
5. Peer Management (sync/peer_manager.rs
)
The PeerManager
handles peer registration and relationship management:
impl PeerManager {
/// Register a new peer
pub fn register_peer(&self, pubkey: &str, display_name: Option<&str>) -> Result<()>;
/// Add database sync relationship
pub fn add_tree_sync(&self, peer_pubkey: &str, tree_root_id: &str) -> Result<()>;
/// Get peers that sync a specific database
pub fn get_tree_peers(&self, tree_root_id: &str) -> Result<Vec<String>>;
}
Data storage:
- Peers stored in
peers.{pubkey}
paths in sync database - Database relationships in
peers.{pubkey}.sync_trees
arrays - Addresses in
peers.{pubkey}.addresses
arrays
6. Sync State Tracking (sync/state.rs
)
Persistent state tracking for synchronization progress:
pub struct SyncCursor {
pub peer_pubkey: String,
pub tree_id: ID,
pub last_synced_entry: Option<ID>,
pub last_sync_time: String,
pub total_synced_count: u64,
}
pub struct SyncMetadata {
pub peer_pubkey: String,
pub successful_sync_count: u64,
pub failed_sync_count: u64,
pub total_entries_synced: u64,
pub average_sync_duration_ms: f64,
}
Storage organization:
sync_state/
├── cursors/{peer_pubkey}/{tree_id} -> SyncCursor
├── metadata/{peer_pubkey} -> SyncMetadata
└── history/{sync_id} -> SyncHistoryEntry
7. Transport Layer (sync/transports/
)
Modular transport system supporting multiple protocols with SyncHandler architecture:
pub trait SyncTransport: Send + Sync {
/// Start server with handler for processing requests
async fn start_server(&mut self, addr: &str, handler: Arc<dyn SyncHandler>) -> Result<()>;
/// Send entries to peer
async fn send_entries(&self, address: &Address, entries: &[Entry]) -> Result<()>;
/// Send sync request and get response
async fn send_request(&self, address: &Address, request: &SyncRequest) -> Result<SyncResponse>;
}
SyncHandler Architecture:
The transport layer uses a callback-based handler pattern to enable database access:
pub trait SyncHandler: Send + Sync {
/// Handle incoming sync requests with database access
async fn handle_request(&self, request: &SyncRequest) -> SyncResponse;
}
This architecture solves the fundamental problem of received data storage by:
- Providing database backend access to transport servers
- Enabling stateful request processing (GetTips, GetEntries, SendEntries)
- Maintaining clean separation between networking and sync logic
- Supporting both HTTP and Iroh transports with identical handler interface
HTTP Transport:
- REST API endpoint at
/api/v0
for sync operations - JSON serialization for wire format
- Axum-based server with handler state injection
- Standard HTTP error codes
Iroh P2P Transport:
- QUIC-based direct peer connections with handler integration
- Built-in NAT traversal
- Efficient binary protocol with JsonHandler serialization
- Bidirectional streams for request/response pattern
Data Flow
1. Entry Commit Flow
sequenceDiagram
participant App as Application
participant Database as Database
participant Transaction as Transaction
participant Hooks as SyncHooks
participant Cmd as Command Channel
participant BG as BackgroundSync
App->>Database: new_transaction()
Database->>Transaction: create with sync hooks
App->>Transaction: modify data
App->>Transaction: commit()
Transaction->>Backend: store entry
Transaction->>Hooks: execute_hooks(context)
Hooks->>Cmd: send(QueueEntry)
Cmd->>BG: deliver command
Note over BG: Background thread
BG->>BG: handle_command()
BG->>BG: fetch entry from backend
BG->>Transport: send_entries(peer, entries)
2. BackgroundSync Processing
The background thread processes commands immediately upon receipt:
- SendEntries: Transmit entries to peer, retry on failure
- QueueEntry: Fetch entry from backend and send immediately
- SyncWithPeer: Initiate bidirectional synchronization
- AddPeer/RemovePeer: Update peer registry
- CreateRelationship: Establish database-peer sync mapping
- Server operations: Start/stop transport server
Failed operations are automatically added to the retry queue with exponential backoff timing.
3. Smart Duplicate Prevention
Eidetica implements semantic duplicate prevention through Merkle-CRDT tip comparison, eliminating the need for simple "sent entry" tracking.
How It Works
Database Synchronization Process:
- Tip Exchange: Both peers share their current database tips (frontier entries)
- Gap Analysis: Compare local and remote tips to identify missing entries
- Smart Filtering: Only send entries the peer doesn't have (based on DAG analysis)
- Ancestor Inclusion: Automatically include necessary parent entries
// Background sync's smart duplicate prevention
async fn sync_tree_with_peer(&self, peer_pubkey: &str, tree_id: &ID, address: &Address) -> Result<()> {
// Step 1: Get our tips for this database
let our_tips = self.backend.get_tips(tree_id)?;
// Step 2: Get peer's tips via network request
let their_tips = self.get_peer_tips(tree_id, address).await?;
// Step 3: Smart filtering - only send what they're missing
let entries_to_send = self.find_entries_to_send(&our_tips, &their_tips)?;
if !entries_to_send.is_empty() {
self.transport.send_entries(address, &entries_to_send).await?;
}
// Step 4: Fetch what we're missing from them
let missing_entries = self.find_missing_entries(&our_tips, &their_tips)?;
if !missing_entries.is_empty() {
let entries = self.fetch_entries_from_peer(address, &missing_entries).await?;
self.store_received_entries(entries).await?;
}
}
Benefits over Simple Tracking:
Approach | Duplicate Prevention | Correctness | Network Efficiency |
---|---|---|---|
Tip-Based (Current) | ✅ Semantic understanding | ✅ Always correct | ✅ Optimal - only sends needed |
Simple Tracking | ❌ Can get out of sync | ❌ May miss updates | ❌ May send unnecessary data |
Merkle-CRDT Synchronization Algorithm
Phase 1: Tip Discovery
sequenceDiagram
participant A as Peer A
participant B as Peer B
A->>B: GetTips(tree_id)
B->>A: TipsResponse([tip1, tip2, ...])
Note over A: Compare tips to identify gaps
A->>A: find_entries_to_send(our_tips, their_tips)
A->>A: find_missing_entries(our_tips, their_tips)
Phase 2: Gap Analysis
The find_entries_to_send
method performs sophisticated DAG analysis:
fn find_entries_to_send(&self, our_tips: &[ID], their_tips: &[ID]) -> Result<Vec<Entry>> {
// Find tips that peer doesn't have
let tips_to_send: Vec<ID> = our_tips
.iter()
.filter(|tip_id| !their_tips.contains(tip_id))
.cloned()
.collect();
if tips_to_send.is_empty() {
return Ok(Vec::new()); // Peer already has everything
}
// Use DAG traversal to collect all necessary ancestors
self.collect_ancestors_to_send(&tips_to_send, their_tips)
}
Phase 3: Efficient Transfer
Only entries that are genuinely missing are transferred:
- No duplicates: Tips comparison guarantees no redundant sends
- Complete data: DAG traversal ensures all dependencies included
- Bidirectional: Both peers send and receive simultaneously
- Incremental: Only new changes since last sync
Integration with Command Pattern
The smart duplicate prevention integrates seamlessly with the command architecture:
Direct Entry Sends:
// Via SendEntries command - caller determines what to send
self.command_tx.send(SyncCommand::SendEntries {
peer: peer_pubkey.to_string(),
entries // No filtering - trust caller
}).await?;
Database Synchronization:
// Via SyncWithPeer command - background sync determines what to send
self.command_tx.send(SyncCommand::SyncWithPeer {
peer: peer_pubkey.to_string()
}).await?;
// Background sync performs tip comparison and smart filtering
Performance Characteristics
Network Efficiency:
- O(tip_count) network requests for tip discovery
- O(missing_entries) data transfer (minimal)
- Zero redundancy in steady state
Computational Complexity:
- O(n log n) tip comparison where n = tip count
- O(m) DAG traversal where m = missing entries
- Constant memory per sync operation
State Requirements:
- No persistent tracking of individual sends needed
- Stateless operation - each sync is independent
- Self-correcting - any missed entries caught in next sync
4. Handshake Protocol
Peer connection establishment:
sequenceDiagram
participant A as Peer A
participant B as Peer B
A->>B: HandshakeRequest { device_id, public_key, challenge }
B->>B: verify signature
B->>B: register peer
B->>A: HandshakeResponse { device_id, public_key, challenge_response }
A->>A: verify signature
A->>A: register peer
Note over A,B: Both peers now registered and authenticated
Performance Characteristics
Memory Usage
BackgroundSync state: Minimal memory footprint
- Single background thread with owned state
- Retry queue: O(n) where n = failed entries pending retry
- Peer state: ~1KB per registered peer
- Relationships: ~100 bytes per peer-database relationship
Persistent state: Stored in sync database
- Sync cursors: ~200 bytes per peer-database relationship
- Metadata: ~500 bytes per peer
- History: ~300 bytes per sync operation (with cleanup)
- Sent entries tracking: ~50 bytes per entry-peer pair
Network Efficiency
Immediate processing:
- Commands processed as received (no batching delay)
- Failed sends added to retry queue with exponential backoff
- Automatic compression in transport layer
Background timers:
- Periodic sync: Every 5 minutes (configurable)
- Retry processing: Every 30 seconds
- Connection checks: Every 60 seconds
Concurrency
Single-threaded design:
- One background thread handles all sync operations
- No lock contention or race conditions
- Commands queued via channel (non-blocking)
Async integration:
- Tokio-based event loop
- Non-blocking transport operations
- Works in both async and sync contexts
Connection Management
Lazy Connection Establishment
Eidetica uses a lazy connection strategy where connections are established on-demand rather than immediately when peers are registered:
Key Design Principles:
- No Persistent Connections: Connections are not maintained between sync operations
- Transport-Layer Handling: Connection establishment is delegated to the transport layer
- Automatic Discovery: Background sync periodically discovers and syncs with all registered peers
- On-Demand Establishment: Connections are created when sync operations occur
Connection Lifecycle:
graph LR
subgraph "Peer Registration"
REG[register_peer] --> STORE[Store in Sync Database]
end
subgraph "Discovery & Connection"
TIMER[Periodic Timer<br/>Every 5 min] --> SCAN[Scan Active Peers<br/>from Sync Database]
SCAN --> SYNC[sync_with_peer]
SYNC --> CONN[Transport Establishes<br/>Connection On-Demand]
CONN --> XFER[Transfer Data]
XFER --> CLOSE[Connection Closed]
end
subgraph "Manual Connection"
API[connect_to_peer API] --> HANDSHAKE[Perform Handshake]
HANDSHAKE --> STORE2[Store Peer Info]
end
Benefits of Lazy Connection:
- Resource Efficient: No idle connections consuming resources
- Resilient: Network issues don't affect registered peer state
- Scalable: Can handle many peers without connection overhead
- Self-Healing: Failed connections automatically retried on next sync cycle
Connection Triggers:
-
Periodic Sync (every 5 minutes):
- BackgroundSync scans all active peers from sync database
- Attempts to sync with each peer's registered databases
- Connections established as needed during sync
-
Manual Sync Commands:
SyncWithPeer
command triggers immediate connectionSendEntries
command establishes connection for data transfer
-
Explicit Connection:
connect_to_peer()
API for manual connection establishment- Performs handshake and stores peer information
No Alert on Registration:
When register_peer()
or add_peer_address()
is called:
- Peer information is stored in the sync database
- No command is sent to BackgroundSync
- No immediate connection attempt is made
- Peer will be discovered in next periodic sync cycle (within 5 minutes)
This design ensures that peer registration is a lightweight operation that doesn't block or trigger network activity.
Transport Implementations
Iroh Transport
The Iroh transport provides peer-to-peer connectivity using QUIC with automatic NAT traversal.
Key Components:
- Relay Servers: Intermediary servers that help establish P2P connections
- Hole Punching: Direct connection establishment through NATs (~90% success rate)
- NodeAddr: Contains node ID and direct socket addresses for connectivity
- QUIC Protocol: Provides reliable, encrypted communication
Configuration via Builder Pattern:
The IrohTransportBuilder
allows configuring:
RelayMode
: Controls relay server usageDefault
: Uses n0's production relay serversStaging
: Uses n0's staging infrastructureDisabled
: Direct P2P only (for local testing)Custom(RelayMap)
: User-provided relay servers
enable_local_discovery
: mDNS for local network discovery (future feature)
Address Serialization:
When get_server_address()
is called, Iroh returns a JSON-serialized NodeAddrInfo
containing:
node_id
: The peer's cryptographic identitydirect_addresses
: Socket addresses where the peer can be reached
This allows peers to connect using either relay servers or direct connections, whichever succeeds first.
Connection Flow:
- Endpoint initialization with configured relay mode
- Relay servers help peers discover each other
- Attempt direct connection via hole punching
- Fall back to relay if direct connection fails
- Upgrade to direct connection when possible
HTTP Transport
The HTTP transport provides traditional client-server connectivity using REST endpoints.
Features:
- Simple JSON API at
/api/v0
- Axum server with Tokio runtime
- Request/response pattern
- No special NAT traversal needed
Architecture Benefits
Command Pattern Advantages
Clean separation of concerns:
- Frontend handles API and database management
- Background owns transport and sync state
- No circular dependencies
Flexible communication:
- Fire-and-forget for most operations
- Request-response with oneshot channels when needed
- Graceful degradation if channel full
Reliability Features
Retry mechanism:
- Automatic retry queue for failed operations
- Exponential backoff prevents network flooding
- Configurable maximum retry attempts
- Per-entry failure tracking
State persistence:
- Sync state stored in database via DocStore store
- Tracks sent entries to prevent duplicates
- Survives restarts and crashes
- Provides complete audit trail of sync operations
Handshake security:
- Ed25519 signature verification
- Challenge-response protocol prevents replay attacks
- Device key management integrated with backend
- Mutual authentication between peers
Error Handling
Retry Queue Management
The BackgroundSync engine maintains a retry queue for failed send operations:
- Exponential backoff: 2^attempts seconds delay (max 64 seconds)
- Attempt tracking: Failed sends increment attempt counter
- Maximum retries: Entries dropped after configurable max attempts
- Periodic processing: Retry timer checks queue every 30 seconds
Each retry entry tracks the peer, entries to send, attempt count, and last attempt timestamp.
Transport Error Handling
- Network failures: Added to retry queue with exponential backoff
- Protocol errors: Logged and skipped
- Peer unavailable: Entries remain in retry queue
State Consistency
- Command channel full: Commands dropped (fire-and-forget)
- Hook failures: Don't prevent commit, logged as warnings
- Transport errors: Don't affect local data integrity
Testing Architecture
Current Test Coverage
The sync module maintains comprehensive test coverage across multiple test suites:
Unit Tests (6 passing):
- Hook collection execution and error handling
- Sync cursor and metadata operations
- State manager functionality
Integration Tests (78 passing):
- Basic sync operations and persistence
- HTTP and Iroh transport lifecycles
- Peer management and relationships
- DAG synchronization algorithms
- Protocol handshake and authentication
- Bidirectional sync flows
- Transport polymorphism and isolation
Test Categories
Transport Tests:
- Server lifecycle management for both HTTP and Iroh
- Client-server communication patterns
- Error handling and recovery
- Address management and peer discovery
Protocol Tests:
- Handshake with signature verification
- Version compatibility checking
- Request/response message handling
- Entry synchronization protocols
DAG Sync Tests:
- Linear chain synchronization
- Branching structure handling
- Partial overlap resolution
- Bidirectional sync flows
Implementation Status
Completed Features ✅
Architecture:
- BackgroundSync engine with command pattern
- Single background thread ownership model
- Channel-based frontend/backend communication
- Automatic runtime detection (async/sync contexts)
Core Functionality:
- HTTP and Iroh transport implementations with SyncHandler architecture
- SyncHandler trait enabling database access in transport layer
- Full protocol support (GetTips, GetEntries, SendEntries)
- Ed25519 handshake protocol with signatures
- Persistent sync state via DocStore
- Per-peer sync hook creation
- Retry queue with exponential backoff
- Periodic sync timers (5 min intervals)
State Management:
- Sync relationships tracking
- Peer registration and management
- Transport address handling
- Server lifecycle control
In Progress 🔄
High Priority:
- Complete bidirectional sync algorithm in
sync_with_peer()
- Implement full retry queue processing logic
Medium Priority:
- Device ID management (currently hardcoded)
- Connection health monitoring
- Update remaining test modules for new architecture
Future Enhancements 📋
Performance:
- Entry batching for large sync operations
- Compression for network transfers
- Bandwidth throttling controls
- Connection pooling
Reliability:
- Circuit breaker for problematic peers
- Advanced retry strategies
- Connection state tracking
- Automatic reconnection logic
Monitoring:
- Sync metrics collection
- Health check endpoints
- Performance dashboards
- Sync status visualization