# The Hash Function Design Problem

An avalanche

## Description:

The Hash Function Design Problem is a challenging problem consisting in designing new cryptographic primitives, called hash functions, that are basic for various cryptologic tasks, ranging from message integrity checks to digital signatures, to name only a few.

Hash functions need to meet a quite demanding, and sometimes contradictory, set of properties, which are summarized in the following:

1. The input can be of any length
2. The output has a fixed length
3. For any x, H(x) is easy to compute
4. H is one-way i.e. it is hard to invert, in the sense that for any y it is computationally infeasible to find an x such as y=H(x)
5. H is collision-free i.e. for any x it is computationally infeasible to find another x’ such as H(x)=H(x’)

The hash value of message is supposed to concisely represent the longer message (or document, or file, etc.) from which it was computed. This hash value is a kind of digital fingerprint of the larger document, a relation that allows for essentially substituting the longer document for its hash value, a fact that offers great advantages in terms of efficiency, integrity and confidentiality.

But finding a hash function is quite a challenging task. The demanding properties that all hash functions should meet (see above, with conflicting interests of high efficiency and strong security, for example) makes the designing of this cryptographic primitives quite difficult. Althougth good examples of hash functions are known [Rivest92] [NIST94 ] [DobBossPren96], these are quite dated and their design is a little too obscure. We will try to provide new methods for designing cryptographic hash functions.

The Merkle-Damgaard [Dam89] model for the design of hash functions has shown its good properties over the years, and in fact many of the existing hash functions follow this model, which is schematically represented below:

Merkle-Damgaard model for a hash function

Essentially, this model simplifies the management of large inputs and the production of a fixed-lenght output by using a function F, which is usually called a compression function. Given a compression function, a hash function can be defined by repeated applications of the compression function until the entire message has been processed. In this process, a message of arbitrary length is broken into blocks whose length depends on the compression function, and padded for security reasons, so the size of the message is a multiple of the block size. The blocks are then processed sequentially, taking as input the result of the hash so far and the current message block, with the final output being the hash value for the messagewhich repeatedily operates over a message block and a hash of the previous blocks. The security of this scheme rests on the security of the F function, which has been proven to need every property that H is supposed to have.

One of the most demanding properties implied by the aforementioned is what is called the avalanche effect [Feis73], that is, that an average of one half of the output bits should change whenever a single input bit is complemented. The avalanche effect tries to mathematically abstract the much desirable property of highly nonlinearity between the input and output bits, and specifies that the hashes of messages from a close neighbourhood are dispersed over the whole space. Another desirable property is completeness [KamDav79], defined as the fact that every output bit depends on all the input bits, and not a proper subset of them.

The concepts of completeness and the avalanche effect can be combined to define a new property which is called the strict avalanche criterion [WebsTav85]. A cryptographic function satisfies the strict avalanche criterion when each output bit should change with a probability of one half whenever a single input bit is complemented.

We will use various combinations of the three aforementioned measures to distinguish between good and bad compression functions, and tus good and bad hash functions, converting its finding in an optimization problem.

## Instances and best known solutions for those instances:

In order to define an instance of this problem we need to provide the value of w (the length of each word) and the value of n (number of output bits). In this vein, a well-known instance is MD5 SHA-1 (w=32, n=160). We will concentrate in the instante with parameters w=32 and n=256. Best solutions are not known for any instance of this problem. But, fortunately, there are optimun values for the avalanche effect (n/2) and also various ways of measuring how close is a particular primitive of satisfying the strict avalanche criterion (for example, performing a X2 test with a B(1/2,n) random variable). This will be the approach followed in this project, which will convert the designing problem into an optimization problem by maximizing the amount of avalanche and strict avalanche effect for a given instance.

## Related Papers:

[Feis73] Feistel, H. 1973. Cryptography and Computer Privacy. Scientific American. 228(5): 15-23

[KamDav79] Kam, J., G. Davida. 1979 Structured Design of Substitution-Permutation Encryption Networks. IEEE Trans. on Comp. C-28(10): 747-753

[WebsTav85] Webster, A. and S. Tavares. 1985. On the Design of S-Boxes. Advances in Cryptology -- CRYPTO '85 523-534.

[Rivest92] R.L. Rivest, RFC 1321: The MD5 Message-Digest Algorithm, Internet Activities Board, 1992.

[NIST94] Nat. Institute of Standards and Technology (NIST), Announcement of Weakness in the Secure Hash Standard, 1994.

[DobBossPren96] H. Dobbertin, A. Bosselaers, and B. Preneel, RIPEMD-160: A strengthened version of RIPEMD, Proceedings of 3rd International Workshop on Fast Software Encryption, Springer-Verlag (1996), 71-82.

[Dam89] I. B. Damgaard. A design principle for hash functions Advances in Cryptology, CRYPTO89, LNCS 435, pp 416-42

Last Updated: 4/9/03                                                                               For any question or suggestion, click here to contact with us.