The Hash Function Design Problem


An avalanche 


The hash value of message is supposed to concisely represent the longer message (or document, or file, etc.) from which it was computed. This hash value is a kind of digital fingerprint of the larger document, a relation that allows for essentially substituting the longer document for its hash value, a fact that offers great advantages in terms of efficiency, integrity and confidentiality. But finding a hash function is quite a challenging task. The demanding properties that all hash functions should meet (see above, with conflicting interests of high efficiency and strong security, for example) makes the designing of this cryptographic primitives quite difficult. Althougth good examples of hash functions are known [Rivest92] [NIST94 ] [DobBossPren96], these are quite dated and their design is a little too obscure. We will try to provide new methods for designing cryptographic hash functions. The MerkleDamgaard [Dam89] model for the design of hash functions has shown its good properties over the years, and in fact many of the existing hash functions follow this model, which is schematically represented below:
MerkleDamgaard model for a hash function Essentially, this model simplifies the management of large inputs and the production of a fixedlenght output by using a function F, which is usually called a compression function. Given a compression function, a hash function can be defined by repeated applications of the compression function until the entire message has been processed. In this process, a message of arbitrary length is broken into blocks whose length depends on the compression function, and padded for security reasons, so the size of the message is a multiple of the block size. The blocks are then processed sequentially, taking as input the result of the hash so far and the current message block, with the final output being the hash value for the messagewhich repeatedily operates over a message block and a hash of the previous blocks. The security of this scheme rests on the security of the F function, which has been proven to need every property that H is supposed to have. One of the most demanding properties implied by the aforementioned is what is called the avalanche effect [Feis73], that is, that an average of one half of the output bits should change whenever a single input bit is complemented. The avalanche effect tries to mathematically abstract the much desirable property of highly nonlinearity between the input and output bits, and specifies that the hashes of messages from a close neighbourhood are dispersed over the whole space. Another desirable property is completeness [KamDav79], defined as the fact that every output bit depends on all the input bits, and not a proper subset of them. The concepts of completeness and the avalanche effect can be combined to define a new property which is called the strict avalanche criterion [WebsTav85]. A cryptographic function satisfies the strict avalanche criterion when each output bit should change with a probability of one half whenever a single input bit is complemented. We will use various combinations of the three aforementioned measures to distinguish between good and bad compression functions, and tus good and bad hash functions, converting its finding in an optimization problem. 

Instances and best known solutions for those instances:In order to define an instance of this problem we need to provide the value of w (the length of each word) and the value of n (number of output bits). In this vein, a wellknown instance is MD5 SHA1 (w=32, n=160). We will concentrate in the instante with parameters w=32 and n=256. Best solutions are not known for any instance of this problem. But, fortunately, there are optimun values for the avalanche effect (n/2) and also various ways of measuring how close is a particular primitive of satisfying the strict avalanche criterion (for example, performing a X^{2} test with a B(1/2,n) random variable). This will be the approach followed in this project, which will convert the designing problem into an optimization problem by maximizing the amount of avalanche and strict avalanche effect for a given instance. 

Related Papers:[Feis73] Feistel, H. 1973. Cryptography and Computer Privacy. Scientific American. 228(5): 1523 [KamDav79] Kam, J., G. Davida. 1979 Structured Design of SubstitutionPermutation Encryption Networks. IEEE Trans. on Comp. C28(10): 747753 [WebsTav85] Webster, A. and S. Tavares. 1985. On the Design of SBoxes. Advances in Cryptology  CRYPTO '85 523534. [Rivest92] R.L. Rivest, RFC 1321: The MD5 MessageDigest Algorithm, Internet Activities Board, 1992. [NIST94] Nat. Institute of Standards and Technology (NIST), Announcement of Weakness in the Secure Hash Standard, 1994. [DobBossPren96] H. Dobbertin, A. Bosselaers, and B. Preneel, RIPEMD160: A strengthened version of RIPEMD, Proceedings of 3rd International Workshop on Fast Software Encryption, SpringerVerlag (1996), 7182. [Dam89] I. B. Damgaard. A design principle for hash functions Advances in Cryptology, CRYPTO89, LNCS 435, pp 41642 

Last Updated: 4/9/03 For any question or suggestion, click here to contact with us. 
