Skip to content
Surf Wiki
Save to docs
general/error-detection-and-correction

From Surf Wiki (app.surf) — the open knowledge base

Pearson hashing

Fast 8-bit hash function


Summary

Fast 8-bit hash function

Pearson hashing is a non-cryptographic hash function designed for fast execution on processors with 8-bit registers. Given an input consisting of any number of bytes, it produces as output a single byte that is strongly dependent on every byte of the input. Its implementation requires only a few instructions, plus a 256-byte lookup table containing a permutation of the values 0 through 255.{{Citation |access-date= 2013-07-13 |archive-url= https://web.archive.org/web/20120704025921/http://cs.mwsu.edu/~griffin/courses/2133/downloads/Spring11/p677-pearson.pdf |archive-date= 2012-07-04 |url-status= dead

This hash function is a CBC-MAC that uses an 8-bit substitution cipher implemented via the substitution table. An 8-bit cipher has negligible cryptographic security, so the Pearson hash function is not cryptographically strong, but it is useful for implementing hash tables or as a data integrity check code, for which purposes it offers these benefits:

  • It is extremely simple.
  • It executes quickly on resource-limited processors.
  • There is no simple class of inputs for which collisions (identical outputs) are especially likely.
  • Given a small, privileged set of inputs (e.g., reserved words for a compiler), the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a perfect hash function.
  • Two input strings differing by exactly one character never collide.{{Citation

One of its drawbacks when compared with other hashing algorithms designed for 8-bit processors is the suggested 256 byte lookup table, which can be prohibitively large for a small microcontroller with a program memory size on the order of hundreds of bytes. A workaround to this is to use a simple permutation function instead of a table stored in program memory. However, using a too simple function, such as T[i] = 255-i, partly defeats the usability as a hash function as anagrams will result in the same hash value; using a too complex function, on the other hand, will affect speed negatively. Using a function rather than a table also allows extending the block size. Such functions naturally have to be bijective, like their table variants.

The algorithm can be described by the following pseudocode, which computes the hash of message C using the permutation table T:

algorithm pearson hashing is h := 0

for each c in C loop h := T[ h xor c ] end loop

return h

The hash variable () may be initialized differently, e.g. to the length of the data () modulo 256.

Example implementations

[[C++|C#]], 8-bit

C++
public class PearsonHashing
{
    public static byte Hash(string input)
    {
        byte[] T = { /* Permutation of 0-255 */ };
        
        byte hash = 0;
        byte[] bytes = Encoding.UTF8.GetBytes(input);

        foreach (byte b in bytes)
        {
            hash = T[hash ^ b];
        }

        return hash;
    }
}

References

Wikipedia Source

This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page.

Want to explore this topic further?

Ask Mako anything about Pearson hashing — get instant answers, deeper analysis, and related topics.

Research with Mako

Free with your Surf account

Content sourced from Wikipedia, available under CC BY-SA 4.0.

This content may have been generated or modified by AI. CloudSurf Software LLC is not responsible for the accuracy, completeness, or reliability of AI-generated content. Always verify important information from primary sources.

Report