Skip to content

Perl interface implementation to xxHash 64 bit algorithm

License

Notifications You must be signed in to change notification settings

DoubleBB/digest-xxhash64

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME

Digest::xxH64 - Perl interface implementation to xxHash 64 bit algorithm

SYNOPSIS

# Object oriented style interface

use Digest::xxH64;

my $xx = Digest::xxH64->new;

$xx->add($data1, $data2, $data3);
$xx->add($data4);
$xx->addfile($file_hadle);
$xx->addfile($another_file_hadle, $read_length);

my $hash64 = $xx->digest();
my $hash64_hex = $xx->hexdigest();
my $hash64_bin = $xx->bindigest();

$xx->reset($seed);
$xx->add($data5);
my $another_hash64 = $xx->digest();
my $another_hash64_hex = $xx->hexdigest();
my $another_hash64_bin = $xx->bindigest();



# Functional style interface

use Digest::xxH64 qw( xxHash64 xxHash64hex xxHash64bin xx64 xx64hex xx64bin xxH64 xxH64hex xxH64bin );

my $hash64 = xxHash64($data);
my $hash64_hex = xxHash64hex($data);
my $hash64_bin = xxHash64bin($data);

my $hash64 = xx64($data);
my $hash64_hex = xx64hex($data);
my $hash64_bin = xx64bin($data);

my $hash64 = xxHash64($data, $seed);
my $hash64_hex = xxHash64hex($data, $seed);
my $hash64_bin = xxHash64bin($data, $seed);

DESCRIPTION

xxHash is an extremely fast non-cryptographic hash algorithm, working at speeds close to RAM limits. It is proposed in two flavors, 32 and 64 bits, producing 32 bit or 64 bit long digest output, respectively. (See http://www.xxhash.com for details) This module is an object oriented interface of xxHash 64-bit hash functions. The interface implementation can handle data of arbitrary length and capable to run both on 32 bit and 64 bit Perl environment.

OBJECT ORIENTED STYLE INTERFACE

The object oriented interface of Digest::xxH64 is described in this section. After a Digest::xxH64 object has been created, you will add data to it (once or several times) and finally ask for the digest in a suitable format. A single object can be used to calculate multiple digests. Error handling is provided by croak() in all method.

The following methods are provided:

$xx = Digest::xxH64->new( [$seed] )

The constructor returns a new Digest::xxH64 object which encapsulate the state of the xxHash 64 bit digest algorithm.

If called as an instance method (i.e. $xx->new) a new object is created and returned in this case.

$seed is an optional argument. This is the seed parameter of the algorithm and being a max 64 bit integer. Seed can be used to alter the result predictably. If omitted zero is assumed.

$xx->reset( [$seed] )

Resets object's internal state.

$seed is an optional argument. This is the seed parameter of the algorithm itself and being a max 64bit integer value. Seed can be used to alter the digest result predictably. If omitted zero is assumed.

$xx->clone

Create and return a copy of the $xx object as Digest::xxH64 in its actual state. It is useful when you do not want to destroy the digests state, but need an intermediate value of the digest, e.g. when calculating digests iteractively on a continuous data stream. Example:

my $xx = Digest::xxH64->new;
open($FH, '<', 'myfile.bin');
binmode($FH);
while (read($FH, $buffer, 4096)) {
  $xx->add($buffer);
  print "hash so far: $.: ", $xx->clone->hash_hex, "\n";
}
$xx->add($data, ...)

The $data provided as argument are appended to the byte sequence we calculate the digest for. The return value is the number of bytes added.

All these lines will have the same effect on the state of the $xx object:

$xx->add("a"); $xx->add("b"); $xx->add("c");
$xx->add("ab"); $xx->add("c");
$xx->add("a", "b", "c");
$xx->add("abc");
$xx->addfile($io_handle, [$len_to_read])

The file represented by $io_handle will be read from its current position until EOF and its content appended to the byte sequence we calculate the digest for. The return value is the number of bytes added. The optional $len_to_read argument specifies the maximal data amount to read from the file. So making possible to calculate digest based only on a portion of the file. The return value shows if EOF happend before reading as much data. In such a case, return value is less than $len_to_read.

The addfile() method will croak() if it fails reading data for some reason other than EOF. If it croaks it is unpredictable what the state of the $xx object will be in. The addfile() method might have been able to read the file partially before it failed. It is probably wise to discard or reset the $xx object if this occurs or to clone before to revert to original state.

In most cases you want to make sure that the $io_handle is in binmode before you pass it as argument to the addfile() method.

$xx->digest

Return the 64 bit binary digest for the previously added data. The returned number is a 64 bit unsigned integer. It is useful when to store the digest into database field directly as a number. Under 32 bit Perl environment you have to use Math::Int64 module to handle this type of numbers.

Note that the digest operation is effectively a destructive, read-once operation. Once it has been performed, the Digest::xxH64 object state is automatically finalized and need to call reset method to be used to calculate another digest value for another data. Call $xx->clone->digest if you want to calculate the digest without resetting the digest state.

$xx->hexdigest

Same as $xx->digest, but will return the 64 bit digest in hexadecimal form. The length of the returned string will be 16 characters and it will only contain characters from this set: '0'..'9' and 'A'..'F'.

$xx->bindigest

Same as $xx->digest, but will return the 64 bit digest as 8 bytes length string in Big Endian order. In the output string, the first byte/character contains the most significant 8 bits and the last byte/character contains the least significant 8 bits.

FUNCTIONAL STYLE INTERFACE

The following functions are provided by the Digest::xxH64 module. None of these functions are exported by default. Error handling is provided by croak() in all method.

xxHash64($data, [$seed])

Return the binary digest for the data argument. The returned number is a 64 bit unsigned integer. Under 32 bit Perl environment you have to use Math::Int64 module to handle this type of numbers. $seed is an optional argument. This is the seed parameter of the algorithm and being a max 64 bit integer. Seed can be used to alter the result predictably. If omitted zero is assumed.

xxH64($data, [$seed])

A shorter alias name for xxHash64 function.

xx64($data, [$seed])

A shorter alias name for xxHash64 function.

xxHash64hex($data, [$seed])

Same as xxHash64() function, but will return the digest in hexadecimal form. The length of the returned string will be 16 and it will only contain characters from this set: '0'..'9' and 'A'..'F'.

xxH64hex($data, [$seed])

A shorter alias name for xxHash64hex function.

xx64hex($data, [$seed])

A shorter alias name for xxHash64hex function.

xxHash64bin($data, [$seed])

Same as xxHash64() function, but will return the 64 bit digest as 8 bytes length string in Big Endian order. In the output string, the first byte/character contains the most significant 8 bits and the last byte/character contains the least significant 8 bits.

xxH64bin($data, [$seed])

A shorter alias name for xxHash64bin function.

xx64bin($data, [$seed])

A shorter alias name for xxHash64bin function.

EXAMPLES

To calculate digest of 3 separate strings as one concatenated string in OO style:

use Digest::xxH64;

$xx = Digest::xxH64->new;
$xx->add('foo', 'bar');
$xx->add('baz');
$xx = $xx->hexdigest;

print "Digest is $digest\n";

With OO style, you can break the input data arbitrarily. This means that we are not limited to have space for the whole input data in memory, i.e. we can handle input data of any size.

This is useful when calculating checksum for files or network streams:

use Digest::xxH64;

my $filename = __FILE__;
open (my $FH, '<', $filename) or die "Can't open '$filename': $!";
binmode($FH);

my $xx = Digest::xxH64->new;
while (read($FH, $buffer, 65536)) {
    $xx->add($buffer);
}
close($FH);
print $xx->hexdigest, '  ', $filename, "\n";

Or we can use the addfile method for more efficient reading of the file:

use Digest::xxH64;

my $filename = __FILE__;
open (my $FH, '<', $filename) or die "Can't open '$filename': $!";
binmode($FH);
my $xx = Digest::xxH64->new;
$xx->addfile($FH);
close($FH);
print $xx->hexdigest, '  ', $filename, "\n";

To calculate digest based on a small portion of a large file use the followings strategy: (in this example, we calculate digest based on 64 bytes of the file starting at 123 byte postion.) use Digest::xxH64;

my $filename = __FILE__;
open (my $FH, '<', $filename) or die "Can't open '$filename': $!";
binmode($FH);
seek($FH,0,123)
my $xx = Digest::xxH64->new;
$xx->addfile($FH, 64);
close($FH);
print $xx->hexdigest, '  ', $filename, "\n";

Calculating digests for two different strings:

use Digest::xxH64;

my $xx = Digest::xxH64->new;

$xx->add($string1);
print $xx->hexdigest, ' for ', $tring1, "\n";

$xx->reset;

$xx->add($string2);
print $xx->hexdigest, ' for ', $tring2, "\n";

xxHash algorithm accepts a seed value to set initial state of digest calculation. This makes possible to produce uniquely different digest in a predictive way. So, new and reset method accepts an optional argument representing the seed value. The argument must be between 0 and 2^64-1

In this example two different digests are calculated to the same string by providing different seed values. Note: unspecified seed value means to use 0 as seed value.

use Digest::xxH64;

my $seed1 = 12345;
my $xx = Digest::xxH64->new($seed1);

$xx->add($string1);
print $xx->hexdigest, ' for ', $tring1, "\n";

my $seed1 = 986767364;
$xx->reset($seed2);

$xx->add($string1);
print $xx->hexdigest, ' for ', $tring1, "\n";

To calculate digest of a string in functional style:

This example calculates the digest and displays it in hexadecimal form:

use Digest::xxH64 qw( xxHash64hex );

my $data_string = "Hello world!";
my $hash64_hex = xxHash64hex($data_string);
print ("The digest value is ", $hash64_hex, "\n");

If you prefer to type less use xx64hex function name isntead of xxHash64hex:

use Digest::xxH64 qw( xx64hex );

my $data_string = "Hello world!";
my $hash64_hex = xx64hex($data_string);
print ("The digest value is ", $hash64_hex, "\n");

xxHash algorithm accepts a seed value to set initial state of digest calculation. This makes possible to produce uniquely different digest in a predictive way. In previously example we did not specify seed value for xxHash64hex funcion. It means, 0 as default value was used as seed value. In the following exampe we specify a seed value, so the calculated digest value will be different than in previous example:

use Digest::xxH64 qw( xxHash64hex );

my $seed = 123456789;
my $data_string = "Hello world!";
my $hash64_hex = xxHash64hex($data_string, $seed);
print ("The new digest value is ", $hash64_hex, "\n");

COPYRIGHT

xxHash code is covered by the BSD license and written by Yann Collet

This wrapper library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Copyright (C) 2019 Bela Bodecs

About

Perl interface implementation to xxHash 64 bit algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published