From WikiChip
Difference between revisions of "mirc/identifiers/$hash"
< mirc‎ | identifiers

(Create content for empty page)
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{mirc title|$hash Identifier}}
 
{{mirc title|$hash Identifier}}
The '''$hash''' identifier calculates a simple hash of the supplied text. Hash is shown as a decimal number in the range from 1 to 2^32-1. You should probably avoid using this hash for reasons explained in the Notes section.
+
The '''$hash''' identifier calculates a simple hash of the supplied text. The hash is shown as a decimal number in the range from 0 to 2^N-1 where N is the number of bits ranging from 2-32. Except for legacy scripts, you should probably avoid using this hash for reasons explained in the Notes section. It's possible that $hash() was once the method used by /hadd to put an item into one of the hash table's "buckets", but that was not the case at the point when it was changed to use 'modified FNV-1A'.
  
 
== Synopsis ==
 
== Synopsis ==
 
<pre>$hash(<text>,<N>)</pre>
 
<pre>$hash(<text>,<N>)</pre>
 
 
== Switches ==
 
== Switches ==
 
None
 
None
 
 
== Parameters ==
 
== Parameters ==
 
{{ArgsList
 
{{ArgsList
 
| text | text string or %variable to be hashed
 
| text | text string or %variable to be hashed
| N | Bit length for the returned hash. $hash returns $null if N is not in the range 2-32
+
| B | Bit length for the returned hash. $hash returns $null if B is a number outside the range 2-32, and returns the original string if the B parameter isn't used. Returned hash is a decimal number in the range 0 to 2^B -1.
 
}}
 
}}
 
== Properties ==
 
== Properties ==
 
 
None
 
None
 
 
== Example ==
 
== Example ==
 
+
<source lang="mIRC">//echo -a The hash is $hash(test,32)
<source lang="mIRC">
+
returns: The hash is 1702094848</source>
//echo -a The hash is $hash(test,32)
 
returns: The hash is 1702094848
 
</source>
 
 
 
 
== Notes ==
 
== Notes ==
 +
The 'fakehash' alias below is based on the code posted here: https://forums.mirc.com/ubbthreads.php/topics/98371/Re:_$hash_function#Post98371
  
The 'fakehash' alias below is based on the code posted here: https://forums.mirc.com/ubbthreads.php/topics/98371/Re:_$hash_function#Post98371
+
I have not been able to find a string where 'fakehash' returns a different number than $hash.<br/>
  
$hash uses a hash function that is very weak for several reasons. The weaknesses are more easily seen by showing an alias which mimics $hash output, and showing $hash output in hex. Some weaknesses include:
+
$hash uses a hash function that is very weak for several reasons. The weaknesses are more easily seen by inspecting the 'fakehash' alias which mimics $hash output, and by showing $hash output as hex instead of a base-10 number. These weaknesses could be why $hash was not also extended to accept binary variables as input when $crc was given this ability. Some weaknesses in $hash include:
  
* 1. Similar text have similar hash. Strings of 1-3 bytes are most obvious:
+
* 1. Similar input have similar output hash. Strings of 1-3 bytes are most obvious:
  
 
<source lang="mIRC">
 
<source lang="mIRC">
 
//echo -a $base($hash(abc,32),10,16) is $+($base($asc(a),10,16,2),$base($asc(b),10,16,2),$base($asc(c),10,16,2),00)
 
//echo -a $base($hash(abc,32),10,16) is $+($base($asc(a),10,16,2),$base($asc(b),10,16,2),$base($asc(c),10,16,2),00)
 
returns: 61626300 is 61626300
 
returns: 61626300 is 61626300
 +
 +
(i.e. 0x61 0x62 and 0x63 are the hex format for $asc() of the 3 characters a b and c.
 
</source>
 
</source>
  
* 2. When N is greater than 24, the rightmost extra bits above 24 are always zero, making it effectively a 24-bit hash not a 32. Only 2^24 of the values within the range of 0 - 2^32-1 can possibly be hashes. i.e. a 27-bit hash always has the least-significant 3 bits as zero:
+
* 2. When N is greater than 24, the rightmost extra bits above 24 are always zero, making it effectively a 24-bit hash not a 32-bit. Only 2^24 of the 2^32 values within the range of 0 - 2^32-1 can possibly be hashes. i.e. the least-significant 3 bits of a 27-bit hash are always zero:
  
<source lang="mIRC">//var %i 99999 | while (%i) { var %n $rand(1,99999999) , %bits $rand(25,32) | if ( $right($base($hash(%n,%bits),10,2,%bits) , $calc(%bits -24) )) echo -a this message will never show | dec %i }</source>
+
<source lang="mIRC">
 +
//var %a $hash($rand(1,999999999),27) | echo -a %a is the same as $and(%a,-8)
 +
or ...
 +
//var %i 99999 | while (%i) { var %n $rand(1,99999999) , %bits $rand(25,32) | if ( $right( $base($hash(%n,%bits),10,2,%bits) , $calc(%bits -24) )) echo -a this message will never show | dec %i }</source>
  
* 3. One of the properties of good hash hash functions is that each changed bit of the input should change close to half the bits in a seemingly-random pattern. Changing 1 bit of the text has minimal change of the bits in the $hash output, often changing just 1 bit. It often takes several additional bytes before the bits set by the first byte are altered.
+
* 3. One of the properties of good hash functions is that each changed bit of the input should change close to half the output bits in a seemingly-random pattern. Changing 1 bit of the input string has minimal change of the bits in the $hash output, often changing output by just 1 bit. $hash() processes the input string 1 byte at a time, and it often takes several additional bytes before the bits altered by the first byte are altered again by another byte.
  
<source lang="mIRC">//echo -a $base($hash(mIRC,32),10,16) / $base($hash(mIRD,32),10,16)
+
<source lang="mIRC">
returns: 4952B000 / 4952B100</source>
+
Here, 3 bits changed in the input changes only 1 bit of the output:
  
* 4. It's easy to create duplicate hashes, especially when the length of the text is a multiple of 3:
+
//echo -a $base($hash(mIRC,32),10,16) vs $base($hash(mIRD,32),10,16)
 +
returns: 4952B000 vs 4952B100</source>
 +
 
 +
* 4. It's easy to create duplicate hashes from different inputs, especially when the length of the inputs are a multiple of 3:
  
 
<source lang="mIRC">//echo -a $base($hash(ABCDEF,32),10,16) / $base($hash(ABDDEE,32),10,16)</source>
 
<source lang="mIRC">//echo -a $base($hash(ABCDEF,32),10,16) / $base($hash(ABDDEE,32),10,16)</source>
  
Instead of using $hash, you would be better off using other substitutes. For example, if you need it to be a decimal number with a variable number of bits from 1-32, use $crc then convert to decimal then reduce the number of bits:
+
* 5. One of the properties of a good hash function is that it should be reasonably fast. However $hash is excrutiatingly slow compared to $crc.
 +
<source lang="mIRC">
 +
//var %string $str(a,8192), %reps 1000, %t $ticks | while (%reps) { noop $crc(%string,0) | dec %reps } | echo -a ticks: $calc($ticks - %t)
 +
//var %string $str(a,8192), %reps 1000, %t $ticks | while (%reps) { noop $hash(%string,32) | dec %reps } | echo -a ticks: $calc($ticks - %t)
 +
</source>
 +
As of v7.56, on a computer where the 1000 repetitions takes $crc approx 1/10th of a second, the same work using $hash takes longer than 30 seconds.
 +
 
 +
Instead of using $hash, you would be better off using other substitutes. For example, if you need it to be a decimal number with a variable 1-32 number of bits, use $crc then convert to decimal then reduce the number of bits:
 +
 
 +
<source lang="mIRC">alias crchash { return $calc( $base($crc($1,0),16,10) % (2^$iif($$2 isnum 1-32,$gettok($2,1,46),32)) ) }
 +
 
 +
//echo -a $base($crchash(mIRC,32),10,16) vs $base($crchash(mIRD,32),10,16)
 +
returns: F01B6971 vs 6E7FFCD2
 +
</source>
 +
 
 +
If the hash needs to be based on a crypto-level hash, or needs more than 32 bits, use up-to-52 bits from $sha1 or $sha512 instead of from $crc.
 +
 
 +
<source lang="mIRC">alias sha1hash {
 +
  var %sha1 $sha1($1) , %offset $base($right(%sha1,1),16,10) + 1 , %hash13 $base($mid(%sha1,%offset,13),16,10)
 +
  return $calc( %hash13 % (2^$iif($2 isnum 1-52,$gettok($2,1,46),32)) )
 +
}
 +
 
 +
//echo -a $base($sha1hash(mIRC,52),10,16) vs $base($sha1hash(mIRD,52),10,16)
 +
returns: 9AB6235ED89DF vs D2F9CD20CA42B
 +
</source>
  
<source lang="mIRC">alias crchash {
+
The above borrows code from $hotp which uses the final digit of the hash to determine which digits within the SHA* hash are used. This next variant is faster, because it doesn't calculate the offset, and it also uses the faster MD5. Because of the 2^53 accuracy limit for $calc, this allows the hash to be up to 52 bits instead of the 32 bits for $hash, and you can easily substitute any of the SHA* identifiers in place of $md5.
  return $calc( $base($crc($1,0),16,10) % 2^$2 )
 
}</source>
 
  
If the hash needs to be crypto-level secure, use 8 bits from $sha1 or $sha512 instead of from $crc.
+
<source lang="mIRC">alias md5hash { return $calc( $base($right($md5($1),13),16,10) % (2^$iif($$2 isnum 1-52,$gettok($2,1,46),32)) )
 +
</source>
  
 
When the fakehash alias is in a remotes script, you should get the same answers from $fakehash as from $hash:
 
When the fakehash alias is in a remotes script, you should get the same answers from $fakehash as from $hash:
 
<source lang="mIRC">
 
<source lang="mIRC">
//var %i 999 | while (%i) { var %text $rand(1,999999999) , %bits $rand(16,24) | if ($hash(%text,%bits) != $fakehash(%text,%bits)) echo -a this should never show: %text %bits | dec %i }
+
//var %i 999 | while (%i) { var %input $rand(1,999999999) , %bits $rand(2,32) | if ($hash(%input,%bits) != $fakehash(%input,%bits)) echo -a this should never show: %input %bits | dec %i }
  
 
alias fakehash {
 
alias fakehash {
Line 78: Line 102:
 
</source>
 
</source>
  
 +
Having good distribution of hash output is not proof that a hash is good, but having bad distribution is evidence of a bad hash. This next alias shows that $hash has a bad distribution:
 +
 +
<source lang="mIRC">;syntax: /hash_distribution BITS STRINGLEN [1stOfRandom Range] [LastOfRandomRange]
 +
alias hash_distribution {
 +
  var %bits $iif($1 isnum 1-32,$1,4) , %stringlen $iif($2,$2,9)
 +
  var %numstrings 10000 , %i %numstrings , %tokens $str(0 $+ $chr(32),$calc(2^%bits))
 +
  var %first $iif($3,$3,a) , %last $iif($4,$4,z)
 +
  while (%i) {
 +
    var %a $regsubex($str(x,%stringlen),/x/g,$r(%first,%last))
 +
    var %h 1 + $hash(%a,%bits) , %tokens $puttok(%tokens,$calc(1+$gettok(%tokens,%h,32)),%h,32)
 +
    dec %i
 +
  }
 +
  echo -ag bits: %bits #randoms: %numstrings stringlen: %stringlen input range $+(%first,-,%last) distribution: %tokens
 +
}</source>
 +
 +
The first parameter tells the number of bits in the output hash, which means there should be 2^N possible outputs.
 +
The 2nd number is the length of random strings to be hashed.
 +
The 3rd and 4th parameters give the option of changing the first and last characters of the random range away from being the range a-z.
 +
 +
Using "/hash_distribution 4 N a z" where N ranges from 8 through 12 shows a very uneven frequency count of hash output, and the quality of the distribution depends greatly on the string length. In this example, because the output is a 4-bit hash, there are 2^4=16 possible outputs, and this alias shows most of the 16 numbers never happen for this length of a-z input, while other outputs happen too frequently:
 +
<source lang="mIRC">/hash_distribution 4 8 a z
 +
bits: 4 #randoms: 10000 stringlen: 8 input range a-z distribution: 300 0 0 0 0 0 0 0 0 0 0 0 0 1342 4954 3404
 +
bits: 4 #randoms: 10000 stringlen: 9 input range a-z distribution: 0 0 0 232 2056 4404 2852 456 0 0 0 0 0 0 0 0
 +
bits: 4 #randoms: 10000 stringlen: 10 input range a-z distribution: 0 0 0 212 2127 4390 2839 432 0 0 0 0 0 0 0 0
 +
bits: 4 #randoms: 10000 stringlen: 11 input range a-z distribution: 0 0 0 213 2122 4371 2837 457 0 0 0 0 0 0 0 0
 +
bits: 4 #randoms: 10000 stringlen: 12 input range a-z distribution: 0 0 0 0 0 0 0 0 0 23 580 2525 3869 2486 498 19</source>
 +
 +
Increasing the number of bits above 4 helps smooth the distribution, and increasing the string length also helps, but even when the string is as long as 100 characters the distribution of hashing lower-case letters is uneven. Also helping to smooth the distribution is to change first/last characters in the random range to increase that range size. But even for using ! and ~ as the first/last characters of the range, which includes a lot of characters unlikely to be in real-world item names, it still has an uneven distribution until  the string length increases sufficiently.
 +
 +
<source lang="mIRC">/hash_distribution 4 10 ! ~
 +
bits: 4 #randoms: 10000 stringlen: 10 input range !-~ distribution: 1256 1167 962 737 409 284 121 52 40 133 292 476 673 1006 1197 1195</source>
 +
 +
The 1024 outputs of a 10 bit hash of a length-100 input fits onto a length-4150 mIRC line, but only because too many of the tokens are single digits:
 +
 +
(Warning: This is slow, and is too long to display here. There's a very large area where consecutive outputs happen 0-3 times. 11 bit instead of 10 bit can be used in a length-8292 line, but is even SLOWER.)
 +
 +
<source lang="mIRC">/hash_distribution 10 100 a z</source>
 +
 +
As you shorten the hash, the distribution gets worse:
 +
<source lang="mIRC">/hash_distribution 2 8 a z
 +
bits: 2 #randoms: 10000 stringlen: 8 distribution: 10000 0 0 0
 +
/hash_distribution 2 9 a z
 +
bits: 2 #randoms: 10000 stringlen: 9 distribution: 0 2265 7735 0</source>
 +
 +
If you edit the fakehash alias to use the above $crchash alias instead of $hash, the distribution is much better for all input lengths and range of characters. Repeating the a-z range with $crchash gives a much smoother distribution.
 +
<source lang="mIRC">/hash_distribution 4 8 a z (using $crchash)
 +
bits: 4 #randoms: 10000 stringlen: 8 input range a-z distribution: 624 618 617 627 610 638 624 626 602 635 612 650 632 600 627 658</source>
 +
 +
$crc is not of cryptographic quality, but at least it has a good distribution, and hash functions don't need a 1-way feature, they just need to be fast. A good distribution is not proof of a good hash, since even a repeating pattern of 1-through-10 has that.
 
== Compatibility ==
 
== Compatibility ==
 
{{mIRC compatibility|5.4}}
 
{{mIRC compatibility|5.4}}
 
 
== See also ==
 
== See also ==
 
* {{mIRC|$crc}}
 
* {{mIRC|$crc}}
* [[List of identifiers - mIRC]]
+
* {{mIRC|$md5}}
{{mIRC identifier list}}
+
* {{mIRC|$sha1}}
 
+
* {{mIRC|$sha256}}
[[Category:MIRC identifiers]]
+
* {{mIRC|$sha384}}
 +
* {{mIRC|$sha512}}
 +
* {{mIRC|/hadd}}

Revision as of 18:25, 12 July 2019

The $hash identifier calculates a simple hash of the supplied text. The hash is shown as a decimal number in the range from 0 to 2^N-1 where N is the number of bits ranging from 2-32. Except for legacy scripts, you should probably avoid using this hash for reasons explained in the Notes section. It's possible that $hash() was once the method used by /hadd to put an item into one of the hash table's "buckets", but that was not the case at the point when it was changed to use 'modified FNV-1A'.

Synopsis

$hash(<text>,<N>)

Switches

None

Parameters

text text string or %variable to be hashed
B Bit length for the returned hash. $hash returns $null if B is a number outside the range 2-32, and returns the original string if the B parameter isn't used. Returned hash is a decimal number in the range 0 to 2^B -1.

Properties

None

Example

//echo -a The hash is $hash(test,32)
returns: The hash is 1702094848

Notes

The 'fakehash' alias below is based on the code posted here: https://forums.mirc.com/ubbthreads.php/topics/98371/Re:_$hash_function#Post98371

I have not been able to find a string where 'fakehash' returns a different number than $hash.

$hash uses a hash function that is very weak for several reasons. The weaknesses are more easily seen by inspecting the 'fakehash' alias which mimics $hash output, and by showing $hash output as hex instead of a base-10 number. These weaknesses could be why $hash was not also extended to accept binary variables as input when $crc was given this ability. Some weaknesses in $hash include:

  • 1. Similar input have similar output hash. Strings of 1-3 bytes are most obvious:
//echo -a $base($hash(abc,32),10,16) is $+($base($asc(a),10,16,2),$base($asc(b),10,16,2),$base($asc(c),10,16,2),00)
returns: 61626300 is 61626300
 
(i.e. 0x61 0x62 and 0x63 are the hex format for $asc() of the 3 characters a b and c.
  • 2. When N is greater than 24, the rightmost extra bits above 24 are always zero, making it effectively a 24-bit hash not a 32-bit. Only 2^24 of the 2^32 values within the range of 0 - 2^32-1 can possibly be hashes. i.e. the least-significant 3 bits of a 27-bit hash are always zero:
//var %a $hash($rand(1,999999999),27) | echo -a %a is the same as $and(%a,-8)
or ...
//var %i 99999 | while (%i) { var %n $rand(1,99999999) , %bits $rand(25,32) | if ( $right( $base($hash(%n,%bits),10,2,%bits) , $calc(%bits -24) )) echo -a this message will never show | dec %i }
  • 3. One of the properties of good hash functions is that each changed bit of the input should change close to half the output bits in a seemingly-random pattern. Changing 1 bit of the input string has minimal change of the bits in the $hash output, often changing output by just 1 bit. $hash() processes the input string 1 byte at a time, and it often takes several additional bytes before the bits altered by the first byte are altered again by another byte.
Here, 3 bits changed in the input changes only 1 bit of the output:
 
//echo -a $base($hash(mIRC,32),10,16) vs $base($hash(mIRD,32),10,16)
returns: 4952B000 vs 4952B100
  • 4. It's easy to create duplicate hashes from different inputs, especially when the length of the inputs are a multiple of 3:
//echo -a $base($hash(ABCDEF,32),10,16) / $base($hash(ABDDEE,32),10,16)
  • 5. One of the properties of a good hash function is that it should be reasonably fast. However $hash is excrutiatingly slow compared to $crc.
//var %string $str(a,8192), %reps 1000, %t $ticks | while (%reps) { noop $crc(%string,0) | dec %reps } | echo -a ticks: $calc($ticks - %t)
//var %string $str(a,8192), %reps 1000, %t $ticks | while (%reps) { noop $hash(%string,32) | dec %reps } | echo -a ticks: $calc($ticks - %t)

As of v7.56, on a computer where the 1000 repetitions takes $crc approx 1/10th of a second, the same work using $hash takes longer than 30 seconds.

Instead of using $hash, you would be better off using other substitutes. For example, if you need it to be a decimal number with a variable 1-32 number of bits, use $crc then convert to decimal then reduce the number of bits:

alias crchash { return $calc( $base($crc($1,0),16,10) % (2^$iif($$2 isnum 1-32,$gettok($2,1,46),32)) ) }
 
//echo -a $base($crchash(mIRC,32),10,16) vs $base($crchash(mIRD,32),10,16)
returns: F01B6971 vs 6E7FFCD2

If the hash needs to be based on a crypto-level hash, or needs more than 32 bits, use up-to-52 bits from $sha1 or $sha512 instead of from $crc.

alias sha1hash {
  var %sha1 $sha1($1) , %offset $base($right(%sha1,1),16,10) + 1 , %hash13 $base($mid(%sha1,%offset,13),16,10)
  return $calc( %hash13 % (2^$iif($2 isnum 1-52,$gettok($2,1,46),32)) )
}
 
//echo -a $base($sha1hash(mIRC,52),10,16) vs $base($sha1hash(mIRD,52),10,16)
returns: 9AB6235ED89DF vs D2F9CD20CA42B

The above borrows code from $hotp which uses the final digit of the hash to determine which digits within the SHA* hash are used. This next variant is faster, because it doesn't calculate the offset, and it also uses the faster MD5. Because of the 2^53 accuracy limit for $calc, this allows the hash to be up to 52 bits instead of the 32 bits for $hash, and you can easily substitute any of the SHA* identifiers in place of $md5.

alias md5hash { return $calc( $base($right($md5($1),13),16,10) % (2^$iif($$2 isnum 1-52,$gettok($2,1,46),32)) )

When the fakehash alias is in a remotes script, you should get the same answers from $fakehash as from $hash:

//var %i 999 | while (%i) { var %input $rand(1,999999999) , %bits $rand(2,32) | if ($hash(%input,%bits) != $fakehash(%input,%bits)) echo -a this should never show: %input %bits | dec %i }
 
alias fakehash {
  if ( ($1 == $null) || ($2 !isnum 2-32) ) return $null
  var %i 1 | var %len $len(%string) | var %x 0 | var %bits $int($2)
  while (%i <= $len($1)) {
    var %y $int($calc( $and(%x,$base(ff000000,16,10)) / 2^24 ))
    var %x = $calc( %x + %y + $asc($mid($1,%i,1)) )
    var %x = $calc( (%x * 256) % (2^32) )
    inc %i
  }
  var %y = $base(%x,10,2,32)
  var %z = $base($left(%y,%bits),2,10)
  if ($mid(%y,$calc(1+%bits))) inc %z
  return $calc( %z % (2^%bits) )
}

Having good distribution of hash output is not proof that a hash is good, but having bad distribution is evidence of a bad hash. This next alias shows that $hash has a bad distribution:

;syntax: /hash_distribution BITS STRINGLEN [1stOfRandom Range] [LastOfRandomRange]
alias hash_distribution {
  var %bits $iif($1 isnum 1-32,$1,4) , %stringlen $iif($2,$2,9)
  var %numstrings 10000 , %i %numstrings , %tokens $str(0 $+ $chr(32),$calc(2^%bits))
  var %first $iif($3,$3,a) , %last $iif($4,$4,z)
  while (%i) {
    var %a $regsubex($str(x,%stringlen),/x/g,$r(%first,%last))
    var %h 1 + $hash(%a,%bits) , %tokens $puttok(%tokens,$calc(1+$gettok(%tokens,%h,32)),%h,32)
    dec %i
  }
  echo -ag bits: %bits #randoms: %numstrings stringlen: %stringlen input range $+(%first,-,%last) distribution: %tokens
}

The first parameter tells the number of bits in the output hash, which means there should be 2^N possible outputs. The 2nd number is the length of random strings to be hashed. The 3rd and 4th parameters give the option of changing the first and last characters of the random range away from being the range a-z.

Using "/hash_distribution 4 N a z" where N ranges from 8 through 12 shows a very uneven frequency count of hash output, and the quality of the distribution depends greatly on the string length. In this example, because the output is a 4-bit hash, there are 2^4=16 possible outputs, and this alias shows most of the 16 numbers never happen for this length of a-z input, while other outputs happen too frequently:

/hash_distribution 4 8 a z
bits: 4 #randoms: 10000 stringlen: 8 input range a-z distribution: 300 0 0 0 0 0 0 0 0 0 0 0 0 1342 4954 3404
bits: 4 #randoms: 10000 stringlen: 9 input range a-z distribution: 0 0 0 232 2056 4404 2852 456 0 0 0 0 0 0 0 0
bits: 4 #randoms: 10000 stringlen: 10 input range a-z distribution: 0 0 0 212 2127 4390 2839 432 0 0 0 0 0 0 0 0
bits: 4 #randoms: 10000 stringlen: 11 input range a-z distribution: 0 0 0 213 2122 4371 2837 457 0 0 0 0 0 0 0 0
bits: 4 #randoms: 10000 stringlen: 12 input range a-z distribution: 0 0 0 0 0 0 0 0 0 23 580 2525 3869 2486 498 19

Increasing the number of bits above 4 helps smooth the distribution, and increasing the string length also helps, but even when the string is as long as 100 characters the distribution of hashing lower-case letters is uneven. Also helping to smooth the distribution is to change first/last characters in the random range to increase that range size. But even for using ! and ~ as the first/last characters of the range, which includes a lot of characters unlikely to be in real-world item names, it still has an uneven distribution until the string length increases sufficiently.

/hash_distribution 4 10 ! ~
bits: 4 #randoms: 10000 stringlen: 10 input range !-~ distribution: 1256 1167 962 737 409 284 121 52 40 133 292 476 673 1006 1197 1195

The 1024 outputs of a 10 bit hash of a length-100 input fits onto a length-4150 mIRC line, but only because too many of the tokens are single digits:

(Warning: This is slow, and is too long to display here. There's a very large area where consecutive outputs happen 0-3 times. 11 bit instead of 10 bit can be used in a length-8292 line, but is even SLOWER.)

/hash_distribution 10 100 a z

As you shorten the hash, the distribution gets worse:

/hash_distribution 2 8 a z
bits: 2 #randoms: 10000 stringlen: 8 distribution: 10000 0 0 0
/hash_distribution 2 9 a z
bits: 2 #randoms: 10000 stringlen: 9 distribution: 0 2265 7735 0

If you edit the fakehash alias to use the above $crchash alias instead of $hash, the distribution is much better for all input lengths and range of characters. Repeating the a-z range with $crchash gives a much smoother distribution.

/hash_distribution 4 8 a z (using $crchash)
bits: 4 #randoms: 10000 stringlen: 8 input range a-z distribution: 624 618 617 627 610 638 624 626 602 635 612 650 632 600 627 658

$crc is not of cryptographic quality, but at least it has a good distribution, and hash functions don't need a 1-way feature, they just need to be fast. A good distribution is not proof of a good hash, since even a repeating pattern of 1-through-10 has that.

Compatibility

Added: mIRC v5.4
Added on: 23 Jun 1998
Note: Unless otherwise stated, this was the date of original functionality.
Further enhancements may have been made in later versions.

See also