Pyde State Sync Protocol
Version 0.1
How new nodes join the network at any point in time. At 30K+ TPS, replaying from genesis is infeasible — snapshot sync is the default.
Sync Modes
| Mode | Use Case | Time |
|---|---|---|
| Full sync (genesis replay) | Archive nodes only | Infeasible at high TPS |
| Snapshot sync (default) | Most full nodes, new committee joiners | ~30-60 min on commodity |
| Light client sync | Mobile wallets, browser, dApp backends | Seconds-minutes |
Snapshot Architecture
Key separation:
- Committee signs state root (cheap, every epoch boundary)
- Volunteers generate chunks (heavier, daily-ish cadence)
This drops committee disk I/O burden. Manifest is small and committee-signed; chunks are large and content-verifiable.
Snapshot Manifest
#![allow(unused)] fn main() { struct SnapshotManifest { epoch: u64, snapshot_state_root_blake3: Hash, snapshot_state_root_poseidon2: Hash, chunk_manifest: Vec<ChunkRef>, current_committee_pubkeys: Vec<FalconPubkey>, // chain-of-trust signatures: Vec<FalconSig>, // ≥85 from prior epoch's committee } struct ChunkRef { chunk_index: u32, chunk_size: u32, chunk_hash: Hash, // Blake3 chunk_path: String, // P2P routing hint } }
Why Dual Roots
- Blake3: fast native verification
- Poseidon2: future ZK light-client compatibility
Both computed at snapshot time, both signed by committee.
Snapshot Cadence
- Committee root signing: every epoch boundary (cheap)
- Chunk publishing: every 8 epochs (~daily) by volunteer infrastructure providers
- Tail sync window: up to 24 hours of txs to catch up
Snapshot Size Projections
| Component | v1 mainnet | 5-year projection |
|---|---|---|
| Account state (~10M accounts × ~150B) | 150 MB – 1.5 GB | 5-10 GB |
| Contract storage (~5× accounts × 64B) | 500 MB – 3 GB | 20 GB |
| Contract code (~50K contracts × 50KB) | ~2.5 GB | 20 GB |
| Total | ~1-3 GB | ~50 GB |
Chunk Format and Merkle Range Proofs
Each snapshot chunk is a self-contained, independently-verifiable bundle of JMT nodes. A chunk's authenticity is proven by walking its nodes' hashes up to the committee-signed state root, using fringe siblings carried in the chunk.
#![allow(unused)] fn main() { struct Chunk { chunk_id: u32, // Contiguous range of jmt_cf entries (internal nodes + leaves) covered by this chunk. nodes: Vec<(NodeKey, NodeContents)>, // The slot_hash → value pairs for leaves in this chunk's range. // (Used to populate state_cf at the new validator.) leaves: Vec<(SlotHash, ValueBytes)>, // Merkle range proof — the sibling hashes along the path from the chunk's // bottom layer up to the global state_root. Needed to verify the chunk // independently of other chunks. fringe_siblings: Vec<(NibblePath, Hash)>, } }
Why fringe siblings
The chunk doesn't contain the entire JMT — that would be every other chunk too. It contains some contiguous portion (e.g., "all nodes whose NibblePath starts with 3a"). To prove that portion is part of the canonical state at the snapshot's version, the chunk must include the sibling hashes along the boundary.
Conceptual example:
Suppose the JMT looks like:
ROOT
/ \
h_3 h_5
/ \ \
... ... leaf at 0x5b22...
A chunk covers leaves under "3a..." prefix. It contains:
- All internal nodes under "3a"
- All leaves under "3a"
- Fringe sibling: h_5 (sibling of h_3 at root level)
- Any other siblings along the path from the "3a" subtree to root
The chunk does NOT include leaves under "5..." prefix; only their hash on the way up.
Verification per chunk
For each chunk received:
1. For each leaf in chunk.leaves:
compute leaf_hash = Hash(slot_hash || value || metadata)
2. Reconstruct internal-node hashes within the chunk's subtree using its
internal-node entries (NodeContents include children's fingerprints).
3. Walk up from the chunk's local root using fringe_siblings at each level:
current_hash = chunk_local_root_hash
for (sibling_path, sibling_hash) in fringe_siblings:
combine_hashes(current_hash, sibling_hash, sibling_path)
4. Final hash MUST equal trusted state_root (from the committee-signed manifest).
5. If yes: chunk is authentic. Write its (NodeKey, NodeContents) pairs into
local jmt_cf, and its (slot_hash, value) pairs into local state_cf.
6. If no: discard. Request the chunk from a different peer (the source was malicious
or corrupted). The bad peer is penalized via peer scoring.
Properties
- Each chunk is independently verifiable. Lose one chunk, request from another peer; no cascading failure.
- The fringe siblings are small (~few hundred bytes per chunk) — they don't materially inflate chunk size.
- The proof is non-interactive — chunk + fringe siblings is enough; no back-and-forth needed.
- Standard cryptographic primitive — Aptos's JMT uses this; Ethereum's MPT has similar range-proof support. Not novel.
Snapshot manifest RPC handler
RPC method: pyde_getSnapshotManifest(wave_id)
→ Returns SnapshotManifest for that wave's snapshot, or NotAvailable.
Behind the scenes:
1. waves_cf.get(wave_id) → WaveCommitRecord → look up jmt version
2. snapshots_cf.get(version) → SnapshotManifest if pre-generated, else None
3. If None: optionally generate on-demand (expensive; archive only)
4. Return manifest
Snapshot generation (background, archive nodes):
- Triggered every N waves (e.g., every epoch)
- Walk jmt_cf at target version, group nodes into ~50MB chunks with key-range partitions
- Compute range proofs (fringe siblings) for each chunk
- Store chunks + manifest in snapshots_cf
- Manifest published with committee threshold sig
Verification Flow
Phase 1: Discover & Verify Manifest
1. Bootstrap from seed peers
2. Discover manifest URLs/hashes from peers
3. Download signed manifest (~5 KB)
4. Verify ≥85 FALCON sigs against trusted committee pubkeys
Phase 2: Download Chunks
5. Discover peers serving snapshot
6. Download chunks in parallel (4 MB each)
7. Verify each chunk_hash against manifest
8. Bad chunks → ban peer, retry from another
Phase 3: Reconstruct State
9. Apply chunks to JMT
10. Compute Blake3 state root locally
11. Compare to manifest.snapshot_state_root_blake3
12. If match: snapshot valid, accept
Phase 4: Recent Sync (Tail)
13. Download blocks from snapshot point to current
14. Replay txs against snapshot state
15. Reach current state, exit sync mode
Phase 5: Active Operation
16. Subscribe to gossip
17. Begin normal participation
Bootstrap from Genesis: Chain-of-Trust
A new node doesn't yet know which committee pubkeys to trust. Solved via genesis chain:
Genesis block: contains committee_0.pubkeys (hardcoded by founders)
↓
Snapshot at epoch 8: signed by committee 0, contains committee_8.pubkeys
↓
Snapshot at epoch 16: signed by committee 8, contains committee_16.pubkeys
↓
... etc forward
New node verifies the chain by:
- Downloading genesis (~5 MB, includes committee_0 pubkeys)
- Downloading intermediate manifests (~5 KB each, hundreds at scale)
- Verifying chain forward: each manifest signed by prior committee
- Accepting current snapshot if chain-of-trust holds
Weak Subjectivity Checkpoints (Optional)
For nodes that don't want full chain-of-trust verification:
- Foundation and reputable infra providers publish "trusted recent checkpoints"
- Signed by their own keys (not committee)
- Assert: "we've verified the chain up to epoch X, root = Y"
- Distributed via known infrastructure (HTTPS, signed websites)
- Updated weekly
New node options:
- Purist: full chain-of-trust from genesis (long but trustless)
- Pragmatist: trust a recent checkpoint, sync from there (fast)
Both produce same security guarantees from the trusted point forward.
Light Client Mode
Doesn't download full state. For mobile wallets, browser dApps, embedded clients.
Storage
- Block headers only (no full blocks)
- Recent committee pubkeys
- Own account state + recent transactions
- JMT proofs for accounts user cares about
Operations
- Verify new block headers via FALCON sigs (~85 verifies, ~6.8ms)
- Query specific accounts: ask full node for
{balance, JMT inclusion proof} - Verify proof against latest signed state root
- Submit transactions: same as regular RPC
Bandwidth
~600 KB/year for typical wallet usage (8 epochs/day × 365 days × ~200 bytes per epoch boundary header).
Incremental Sync (Delta Snapshots)
For nodes with a recent snapshot:
Have: Snapshot at epoch E
Want: Snapshot at epoch E + 8
Delta snapshot:
- Changed accounts since epoch E
- Changed storage slots since E
- New contracts deployed since E
- Signed by committee at E + 8
Apply delta to existing local state → updated snapshot
Saves bandwidth: typical delta is 10-50 MB vs full 3 GB.
Storage / Pruning Policy
| Node type | State retention | Block retention |
|---|---|---|
| Archive node | All historical state | All blocks since genesis |
| Full node (default) | State for last 90 days | Blocks for last 30 days |
| Committee validator | State for last 30 days | Blocks for last 8 epochs |
| Light client | Headers + cared-about accounts | Headers only |
Tunable per-node. Archive nodes earn slightly higher RPC fees for serving historical queries.
Failure Modes & Recovery
| Failure | Detection | Recovery |
|---|---|---|
| All peers serve bad data | Manifest sig fails | Try more peers, ban liars |
| Snapshot corruption mid-download | Chunk hash mismatch | Ban peer, retry chunk from another |
| Manifest signed by wrong committee | Sig verify fails | Reject manifest, find another |
| Network outage during sync | Connection dropped | Resume from last verified chunk |
| Snapshot too old (> evidence window) | Sig set might be slashed | Use newer snapshot |
Time Estimates (commodity hardware, 100 Mbps)
Bootstrap from genesis (small): ~5 seconds
Manifest verification (85 FALCON): ~7 ms
Snapshot download (3 GB at 100 Mbps): ~4 minutes
JMT reconstruction: ~5 minutes
Recent tail sync (8 epochs of txs): ~30 minutes
Total: ~40 minutes
For comparison: Ethereum snap sync 4-24 hours, Cosmos statesync 1-3 hours.
State Growth (v2 Concern)
5-year projection of ~50 GB is optimistic. Solana shows ~80 GB after 4 years despite aggressive engineering.
Future mitigations (defer to v2):
- Account expiration (Aptos pattern): accounts not touched in N years get archived
- Storage rent (Solana pattern): accounts pay rent to stay active
- Stateless validators (Ethereum research): validators use state proofs
References
- Hash strategy: see WHITEPAPER.md §4.3
- Light client (more detail): see WHITEPAPER.md §7
- Network bandwidth: see NETWORK_PROTOCOL.md
Document version: 0.1
License: See repository root