Content Based Chunking algorithms implementation:
- RabinCDC (taken from zbox)
- Leap-based CDC
- Matrix generation code can be found in ef_matrix.rs
- UltraCDC
- SuperCDC
- SeqCDC
Simple code to test an algorithm is provided in filetest.rs.
- Chunkers that work using
std::iter::Iterator
trait, giving out data about the source dataset in the form of chunks. - Chunker sizes can be customized on creation. Default size values are provided.
- Other parameters from corresponding papers can also be modified on chunker creation.
To use them in custom code, the algorithms can be accessed using the corresponding modules, e.g.
fn main() {
let data = vec![1; 1024 * 1024];
let sizes = SizeParams::new(4096, 8192, 16384);
let chunker = ultra::Chunker::new(&data, sizes);
for chunk in chunker {
println!("start: {}, length: {}", chunk.pos, chunk.len);
}
let default_leap = leap_based::Chunker::new(&data, SizeParams::leap_default());
for chunk in default_leap {
println!("start: {}, length: {}", chunk.pos, chunk.len);
}
}