Content-Defined Chunking & Rolling Hash (Leading Zeros) Visualizer

A sliding window moves over the data. At each step, a rolling hash is calculated for the window's content.
The tool converts the hash value into its 32-bit binary representation.
It counts the number of **leading zeros** in this binary string.
If the Count of Leading Zeros is greater than or equal to your chosen Target Leading Zeros (n), a **chunk boundary** is created.
This is a probabilistic method. Searching for $n$ leading zeros means the probability of finding a boundary at any specific point is $1/2^n$. This results in an expected average chunk size of $2^n$ bytes.
Deduplication: When the second stream is processed, the hash of each new chunk is compared against a list of hashes from the first stream. If a match is found, the chunk is marked as a **duplicate** and doesn't need to be stored again, saving space.

Content-Defined Chunking, Rolling Hash, Deduplication Demo