|
| 1 | +//! # Enable elastic scaling MVP for a parachain |
| 2 | +//! |
| 3 | +//! <div class="warning">This guide assumes full familiarity with Asynchronous Backing and its |
| 4 | +//! terminology, as defined in https://wiki.polkadot.network/docs/maintain-guides-async-backing. |
| 5 | +//! Furthermore, the parachain should have already been upgraded according to the guide.</div> |
| 6 | +//! |
| 7 | +//! ## Quick introduction to elastic scaling |
| 8 | +//! |
| 9 | +//! [Elastic scaling](https://polkadot.network/blog/elastic-scaling-streamling-growth-on-polkadot) |
| 10 | +//! is a feature that will enable parachains to seamlessly scale up/down the number of used cores. |
| 11 | +//! This can be desirable in order to increase the compute or storage throughput of a parachain or |
| 12 | +//! to lower the latency between a transaction being submitted and it getting built in a parachain |
| 13 | +//! block. |
| 14 | +//! |
| 15 | +//! At present, with Asynchronous Backing enabled, a parachain can only include a block on the relay |
| 16 | +//! chain every 6 seconds, irregardless of how many cores the parachain acquires. Elastic scaling |
| 17 | +//! builds further on the 10x throughput increase of Async Backing, enabling collators to submit up |
| 18 | +//! to 3 parachain blocks per relay chain block, resulting in a further 3x throughput increase. |
| 19 | +//! |
| 20 | +//! ## Current limitations of the MVP |
| 21 | +//! |
| 22 | +//! The full implementation of elastic scaling spans across the entire relay/parachain stack and is |
| 23 | +//! still [work in progress](https://github.com/paritytech/polkadot-sdk/issues/1829). |
| 24 | +//! The MVP is still considered experimental software, so stability is not guaranteed. |
| 25 | +//! If you encounter any problems, |
| 26 | +//! [please open an issue](https://github.com/paritytech/polkadot-sdk/issues). |
| 27 | +//! Below are described the current limitations of the MVP: |
| 28 | +//! |
| 29 | +//! 1. **Limited core count**. Parachain block authoring is sequential, so the second block will |
| 30 | +//! start being built only after the previous block is imported. The current block production is |
| 31 | +//! capped at 2 seconds of execution. Therefore, assuming the full 2 seconds are used, a |
| 32 | +//! parachain can only utilise at most 3 cores in a relay chain slot of 6 seconds. If the full |
| 33 | +//! execution time is not being used, higher core counts can be achieved. |
| 34 | +//! 2. **Single collator requirement for consistently scaling beyond a core at full authorship |
| 35 | +//! duration of 2 seconds per block.** Using the current implementation with multiple collators |
| 36 | +//! adds additional latency to the block production pipeline. Assuming block execution takes |
| 37 | +//! about the same as authorship, the additional overhead is equal the duration of the authorship |
| 38 | +//! plus the block announcement. Each collator must first import the previous block before |
| 39 | +//! authoring a new one, so it is clear that the highest throughput can be achieved using a |
| 40 | +//! single collator. Experiments show that the peak performance using more than one collator |
| 41 | +//! (measured up to 10 collators) is utilising 2 cores with authorship time of 1.3 seconds per |
| 42 | +//! block, which leaves 400ms for networking overhead. This would allow for 2.6 seconds of |
| 43 | +//! execution, compared to the 2 seconds async backing enabled. |
| 44 | +//! [More experiments](https://github.com/paritytech/polkadot-sdk/issues/4696) are being |
| 45 | +//! conducted in this space. |
| 46 | +//! 3. **Trusted collator set.** The collator set needs to be trusted until there’s a mitigation |
| 47 | +//! that would prevent or deter multiple collators from submitting the same collation to multiple |
| 48 | +//! backing groups. A solution is being discussed |
| 49 | +//! [here](https://github.com/polkadot-fellows/RFCs/issues/92). |
| 50 | +//! 4. **Fixed scaling.** For true elasticity, the parachain must be able to seamlessly acquire or |
| 51 | +//! sell coretime as the user demand grows and shrinks over time, in an automated manner. This is |
| 52 | +//! currently lacking - a parachain can only scale up or down by “manually” acquiring coretime. |
| 53 | +//! This is not in the scope of the relay chain functionality. Parachains can already start |
| 54 | +//! implementing such autoscaling, but we aim to provide a framework/examples for developing |
| 55 | +//! autoscaling strategies. |
| 56 | +//! |
| 57 | +//! Another hard limitation that is not envisioned to ever be lifted is that parachains which create |
| 58 | +//! forks will generally not be able to utilise the full number of cores they acquire. |
| 59 | +//! |
| 60 | +//! ## Using elastic scaling MVP |
| 61 | +//! |
| 62 | +//! ### Prerequisites |
| 63 | +//! |
| 64 | +//! - Ensure Asynchronous Backing is enabled on the network and you have enabled it on the parachain |
| 65 | +//! using [`crate::guides::async_backing_guide`]. |
| 66 | +//! - Ensure the `AsyncBackingParams.max_candidate_depth` value is configured to a value that is at |
| 67 | +//! least double the maximum targeted parachain velocity. For example, if the parachain will build |
| 68 | +//! at most 3 candidates per relay chain block, the `max_candidate_depth` should be at least 6. |
| 69 | +//! - Use a trusted single collator for maximum throughput. |
| 70 | +//! - Ensure enough coretime is assigned to the parachain. For maximum throughput the upper bound is |
| 71 | +//! 3 cores. |
| 72 | +//! |
| 73 | +//! <div class="warning">Phase 1 is not needed if using the `polkadot-parachain` binary built |
| 74 | +//! from the latest polkadot-sdk release! Simply pass the `--experimental-use-slot-based` parameter |
| 75 | +//! to the command line and jump to Phase 2.</div> |
| 76 | +//! |
| 77 | +//! The following steps assume using the cumulus parachain template. |
| 78 | +//! |
| 79 | +//! ### Phase 1 - (For custom parachain node) Update Parachain Node |
| 80 | +//! |
| 81 | +//! This assumes you are using |
| 82 | +//! [the latest parachain template](https://github.com/paritytech/polkadot-sdk/tree/master/templates/parachain). |
| 83 | +//! |
| 84 | +//! This phase consists of plugging in the new slot-based collator. |
| 85 | +//! |
| 86 | +//! 1. In `node/src/service.rs` import the slot based collator instead of the lookahead collator. |
| 87 | +#![doc = docify::embed!("../../cumulus/polkadot-parachain/src/service.rs", slot_based_colator_import)] |
| 88 | +//! |
| 89 | +//! 2. In `start_consensus()` |
| 90 | +//! - Remove the `overseer_handle` param (also remove the |
| 91 | +//! `OverseerHandle` type import if it’s not used elsewhere). |
| 92 | +//! - Rename `AuraParams` to `SlotBasedParams`, remove the `overseer_handle` field and add a |
| 93 | +//! `slot_drift` field with a value of `Duration::from_secs(1)`. |
| 94 | +//! - Replace the single future returned by `aura::run` with the two futures returned by it and |
| 95 | +//! spawn them as separate tasks: |
| 96 | +#![doc = docify::embed!("../../cumulus/polkadot-parachain/src/service.rs", launch_slot_based_collator)] |
| 97 | +//! |
| 98 | +//! 3. In `start_parachain_node()` remove the `overseer_handle` param passed to `start_consensus`. |
| 99 | +//! |
| 100 | +//! ### Phase 2 - Activate fixed factor scaling in the runtime |
| 101 | +//! |
| 102 | +//! This phase consists of a couple of changes needed to be made to the parachain’s runtime in order |
| 103 | +//! to utilise fixed factor scaling. |
| 104 | +//! |
| 105 | +//! First of all, you need to decide the upper limit to how many parachain blocks you need to |
| 106 | +//! produce per relay chain block (in direct correlation with the number of acquired cores). This |
| 107 | +//! should be either 1 (no scaling), 2 or 3. This is called the parachain velocity. |
| 108 | +//! |
| 109 | +//! If you configure a velocity which is different from the number of assigned cores, the measured |
| 110 | +//! velocity in practice will be the minimum of these two. |
| 111 | +//! |
| 112 | +//! The chosen velocity will also be used to compute: |
| 113 | +//! - The slot duration, by dividing the 6000 ms duration of the relay chain slot duration by the |
| 114 | +//! velocity. |
| 115 | +//! - The unincluded segment capacity, by multiplying the velocity with 2 and adding 1 to |
| 116 | +//! it. |
| 117 | +//! |
| 118 | +//! Let’s assume a desired maximum velocity of 3 parachain blocks per relay chain block. The needed |
| 119 | +//! changes would all be done in `runtime/src/lib.rs`: |
| 120 | +//! |
| 121 | +//! 1. Rename `BLOCK_PROCESSING_VELOCITY` to `MAX_BLOCK_PROCESSING_VELOCITY` and increase it to the |
| 122 | +//! desired value. In this example, 3. |
| 123 | +//! |
| 124 | +//! ```ignore |
| 125 | +//! const MAX_BLOCK_PROCESSING_VELOCITY: u32 = 3; |
| 126 | +//! ``` |
| 127 | +//! |
| 128 | +//! 2. Set the `MILLISECS_PER_BLOCK` to the desired value. |
| 129 | +//! |
| 130 | +//! ```ignore |
| 131 | +//! const MILLISECS_PER_BLOCK: u32 = |
| 132 | +//! RELAY_CHAIN_SLOT_DURATION_MILLIS / MAX_BLOCK_PROCESSING_VELOCITY; |
| 133 | +//! ``` |
| 134 | +//! Note: for a parachain which measures time in terms of its own block number, changing block |
| 135 | +//! time may cause complications, requiring additional changes. See here more information: |
| 136 | +//! [`crate::guides::async_backing_guide#timing-by-block-number`]. |
| 137 | +//! |
| 138 | +//! 3. Increase the `UNINCLUDED_SEGMENT_CAPACITY` to the desired value. |
| 139 | +//! |
| 140 | +//! ```ignore |
| 141 | +//! const UNINCLUDED_SEGMENT_CAPACITY: u32 = 2 * MAX_BLOCK_PROCESSING_VELOCITY + 1; |
| 142 | +//! ``` |
0 commit comments