From WikiChip
Difference between revisions of "mirc/identifiers/$hash"
< mirc‎ | identifiers

(Create content for empty page)
 
m
Line 1: Line 1:
 
{{mirc title|$hash Identifier}}
 
{{mirc title|$hash Identifier}}
The '''$hash''' identifier calculates a simple hash of the supplied text. Hash is shown as a decimal number in the range from 1 to 2^32-1. You should probably avoid using this hash for reasons explained in the Notes section.
+
The '''$hash''' identifier calculates a simple hash of the supplied text. The hash is shown as a decimal number in the range from 0 to 2^N-1 where N is the number of bits in the range 2-32. Except for legacy scripts, you should probably avoid using this hash for reasons explained in the Notes section.
  
 
== Synopsis ==
 
== Synopsis ==
Line 11: Line 11:
 
{{ArgsList
 
{{ArgsList
 
| text | text string or %variable to be hashed
 
| text | text string or %variable to be hashed
| N | Bit length for the returned hash. $hash returns $null if N is not in the range 2-32
+
| N | Bit length for the returned hash. $hash returns $null if N is not in the range 2-32. Returned hash is a decimal number in the range 0 to 2^N -1.
 
}}
 
}}
 
== Properties ==
 
== Properties ==
Line 28: Line 28:
 
The 'fakehash' alias below is based on the code posted here: https://forums.mirc.com/ubbthreads.php/topics/98371/Re:_$hash_function#Post98371
 
The 'fakehash' alias below is based on the code posted here: https://forums.mirc.com/ubbthreads.php/topics/98371/Re:_$hash_function#Post98371
  
$hash uses a hash function that is very weak for several reasons. The weaknesses are more easily seen by showing an alias which mimics $hash output, and showing $hash output in hex. Some weaknesses include:
+
I have not been able to find a string where 'fakehash' returns a different number than $hash.<br />
 +
 
 +
$hash uses a hash function that is very weak for several reasons. The weaknesses are more easily seen by showing an alias which mimics $hash output, and showing $hash output in hex. These weaknesses could be why $hash was not extended to accept binary variables as input when $crc was extended. Some weaknesses include:
  
 
* 1. Similar text have similar hash. Strings of 1-3 bytes are most obvious:
 
* 1. Similar text have similar hash. Strings of 1-3 bytes are most obvious:
Line 52: Line 54:
 
Instead of using $hash, you would be better off using other substitutes. For example, if you need it to be a decimal number with a variable number of bits from 1-32, use $crc then convert to decimal then reduce the number of bits:
 
Instead of using $hash, you would be better off using other substitutes. For example, if you need it to be a decimal number with a variable number of bits from 1-32, use $crc then convert to decimal then reduce the number of bits:
  
<source lang="mIRC">alias crchash {
+
<source lang="mIRC">alias crchash { return $calc( $base($crc($1,0),16,10) % 2^$2 ) }</source>
  return $calc( $base($crc($1,0),16,10) % 2^$2 )
 
}</source>
 
  
If the hash needs to be crypto-level secure, use 8 bits from $sha1 or $sha512 instead of from $crc.
+
If the hash needs to be crypto-level secure, use up-to-52 bits from $sha1 or $sha512 instead of from $crc.
  
 
When the fakehash alias is in a remotes script, you should get the same answers from $fakehash as from $hash:
 
When the fakehash alias is in a remotes script, you should get the same answers from $fakehash as from $hash:
Line 78: Line 78:
 
</source>
 
</source>
  
 +
Having good distribution of hashes is not proof that a hash is good, but having bad distribution is evidence of a bad hash. This next alias shows that $hash has a bad distribution:
 +
 +
<source lang="mIRC">;syntax: /hash_distribution BITS STRINGLEN
 +
hash_distribution {
 +
  var %bits $iif($1 isnum 2-32,$1,4)
 +
  var %stringlen $iif($2,$2,9)
 +
  var %numstrings 10000
 +
  var %i %numstrings
 +
  var %tokens $str(0 $+ $chr(32),$calc(2^%bits))
 +
  while (%i) {
 +
    var %a $regsubex($str(x,%stringlen),/x/g,$r(a,z))
 +
    var %h = 1 + $hash(%a,%bits)
 +
    var %tokens $puttok(%tokens,$calc(1+$gettok(%tokens,%h,32)),%h,32)
 +
    dec %i
 +
  }
 +
  echo -ag bits: %bits #randoms: %numstrings stringlen: %stringlen distribution: %tokens
 +
}</source>
 +
 +
The first parameter tells the number of bits in the hash, the 2nd number is the length of random strings to generate. Using "/hash_distribution 4 N" where N is 8 through 12 shows a very uneven frequency count of hash values returned, and the values returned depend greatly on the string length:
 +
<source lang="mIRC">
 +
bits: 4 #randoms: 10000 stringlen: 7 distribution: 201 0 0 0 0 0 0 0 0 0 0 0 0 1516 5128 3155
 +
bits: 4 #randoms: 10000 stringlen: 8 distribution: 297 0 0 0 0 0 0 0 0 0 0 0 0 1283 4971 3449
 +
bits: 4 #randoms: 10000 stringlen: 9 distribution: 0 0 0 216 2038 4459 2799 488 0 0 0 0 0 0 0 0
 +
bits: 4 #randoms: 10000 stringlen: 10 distribution: 0 0 0 182 2097 4376 2867 478 0 0 0 0 0 0 0 0
 +
bits: 4 #randoms: 10000 stringlen: 11 distribution: 0 0 0 196 2080 4381 2869 474 0 0 0 0 0 0 0 0
 +
bits: 4 #randoms: 10000 stringlen: 12 distribution: 0 0 0 0 0 0 0 0 0 20 564 2585 3926 2386 501 18</source>
 +
 +
Increasing the number of bits helps (10 bits is the max that fits on an mIRC line), and increasing the string length also helps, but even when the string is as long as 100 characters the distribution is uneven. As you shorten the hash, the distribution gets worse:
 +
<source lang="mIRC">
 +
bits: 2 #randoms: 10000 stringlen: 8 distribution: 10000 0 0 0
 +
bits: 2 #randoms: 10000 stringlen: 9 distribution: 0 2265 7735 0</source>
 +
 +
If you edit the above alias to use the above example of $crchash instead of $hash, the distribution is much more even for all lengths:
 +
<source lang="mIRC">
 +
bits: 4 #randoms: 10000 stringlen: 8 distribution: 650 575 621 659 583 605 626 593 624 655 638 620 666 595 642 648
 +
bits: 4 #randoms: 10000 stringlen: 9 distribution: 620 612 637 625 622 614 650 627 628 636 596 636 617 662 634 584
 +
bits: 4 #randoms: 10000 stringlen: 10 distribution: 591 611 640 615 595 599 645 605 599 684 620 572 641 639 668 676
 +
bits: 4 #randoms: 10000 stringlen: 11 distribution: 630 626 587 624 654 629 581 637 591 653 660 616 627 607 612 666
 +
bits: 4 #randoms: 10000 stringlen: 12 distribution: 645 619 604 608 628 596 602 655 622 617 591 651 591 663 682 626</source>
 +
 +
$crc is not of cryptographic quality, but at least it has a good distribution. But a good distribution is not proof of a good hash, since even a repeating pattern of 1-through-10 has that.
 
== Compatibility ==
 
== Compatibility ==
 
{{mIRC compatibility|5.4}}
 
{{mIRC compatibility|5.4}}
 
 
== See also ==
 
== See also ==
 
* {{mIRC|$crc}}
 
* {{mIRC|$crc}}
* [[List of identifiers - mIRC]]
+
* {{mIRC|$md5}}
{{mIRC identifier list}}
+
* {{mIRC|$sha1}}
 
+
* {{mIRC|$sha256}}
[[Category:MIRC identifiers]]
+
* {{mIRC|$sha384}}
 +
* {{mIRC|$sha512}}

Revision as of 20:49, 18 January 2018

The $hash identifier calculates a simple hash of the supplied text. The hash is shown as a decimal number in the range from 0 to 2^N-1 where N is the number of bits in the range 2-32. Except for legacy scripts, you should probably avoid using this hash for reasons explained in the Notes section.

Synopsis

$hash(<text>,<N>)

Switches

None

Parameters

text text string or %variable to be hashed
N Bit length for the returned hash. $hash returns $null if N is not in the range 2-32. Returned hash is a decimal number in the range 0 to 2^N -1.

Properties

None

Example

//echo -a The hash is $hash(test,32)
returns: The hash is 1702094848

Notes

The 'fakehash' alias below is based on the code posted here: https://forums.mirc.com/ubbthreads.php/topics/98371/Re:_$hash_function#Post98371

I have not been able to find a string where 'fakehash' returns a different number than $hash.

$hash uses a hash function that is very weak for several reasons. The weaknesses are more easily seen by showing an alias which mimics $hash output, and showing $hash output in hex. These weaknesses could be why $hash was not extended to accept binary variables as input when $crc was extended. Some weaknesses include:

  • 1. Similar text have similar hash. Strings of 1-3 bytes are most obvious:
//echo -a $base($hash(abc,32),10,16) is $+($base($asc(a),10,16,2),$base($asc(b),10,16,2),$base($asc(c),10,16,2),00)
returns: 61626300 is 61626300
  • 2. When N is greater than 24, the rightmost extra bits above 24 are always zero, making it effectively a 24-bit hash not a 32. Only 2^24 of the values within the range of 0 - 2^32-1 can possibly be hashes. i.e. a 27-bit hash always has the least-significant 3 bits as zero:
//var %i 99999 | while (%i) { var %n $rand(1,99999999) , %bits $rand(25,32) | if ( $right($base($hash(%n,%bits),10,2,%bits) , $calc(%bits -24) )) echo -a this message will never show | dec %i }
  • 3. One of the properties of good hash hash functions is that each changed bit of the input should change close to half the bits in a seemingly-random pattern. Changing 1 bit of the text has minimal change of the bits in the $hash output, often changing just 1 bit. It often takes several additional bytes before the bits set by the first byte are altered.
//echo -a $base($hash(mIRC,32),10,16) / $base($hash(mIRD,32),10,16)
returns: 4952B000 / 4952B100
  • 4. It's easy to create duplicate hashes, especially when the length of the text is a multiple of 3:
//echo -a $base($hash(ABCDEF,32),10,16) / $base($hash(ABDDEE,32),10,16)

Instead of using $hash, you would be better off using other substitutes. For example, if you need it to be a decimal number with a variable number of bits from 1-32, use $crc then convert to decimal then reduce the number of bits:

alias crchash { return $calc( $base($crc($1,0),16,10) % 2^$2 ) }

If the hash needs to be crypto-level secure, use up-to-52 bits from $sha1 or $sha512 instead of from $crc.

When the fakehash alias is in a remotes script, you should get the same answers from $fakehash as from $hash:

//var %i 999 | while (%i) { var %text $rand(1,999999999) , %bits $rand(16,24) | if ($hash(%text,%bits) != $fakehash(%text,%bits)) echo -a this should never show: %text %bits | dec %i }
 
alias fakehash {
  if ( ($1 == $null) || ($2 !isnum 2-32) ) return $null
  var %i 1 | var %len $len(%string) | var %x 0 | var %bits $int($2)
  while (%i <= $len($1)) {
    var %y $int($calc( $and(%x,$base(ff000000,16,10)) / 2^24 ))
    var %x = $calc( %x + %y + $asc($mid($1,%i,1)) )
    var %x = $calc( (%x * 256) % (2^32) )
    inc %i
  }
  var %y = $base(%x,10,2,32)
  var %z = $base($left(%y,%bits),2,10)
  if ($mid(%y,$calc(1+%bits))) inc %z
  return $calc( %z % (2^%bits) )
}

Having good distribution of hashes is not proof that a hash is good, but having bad distribution is evidence of a bad hash. This next alias shows that $hash has a bad distribution:

;syntax: /hash_distribution BITS STRINGLEN
hash_distribution {
  var %bits $iif($1 isnum 2-32,$1,4)
  var %stringlen $iif($2,$2,9)
  var %numstrings 10000
  var %i %numstrings
  var %tokens $str(0 $+ $chr(32),$calc(2^%bits))
  while (%i) {
    var %a $regsubex($str(x,%stringlen),/x/g,$r(a,z))
    var %h = 1 + $hash(%a,%bits)
    var %tokens $puttok(%tokens,$calc(1+$gettok(%tokens,%h,32)),%h,32)
    dec %i
  }
  echo -ag bits: %bits #randoms: %numstrings stringlen: %stringlen distribution: %tokens
}

The first parameter tells the number of bits in the hash, the 2nd number is the length of random strings to generate. Using "/hash_distribution 4 N" where N is 8 through 12 shows a very uneven frequency count of hash values returned, and the values returned depend greatly on the string length:

bits: 4 #randoms: 10000 stringlen: 7 distribution: 201 0 0 0 0 0 0 0 0 0 0 0 0 1516 5128 3155
bits: 4 #randoms: 10000 stringlen: 8 distribution: 297 0 0 0 0 0 0 0 0 0 0 0 0 1283 4971 3449
bits: 4 #randoms: 10000 stringlen: 9 distribution: 0 0 0 216 2038 4459 2799 488 0 0 0 0 0 0 0 0
bits: 4 #randoms: 10000 stringlen: 10 distribution: 0 0 0 182 2097 4376 2867 478 0 0 0 0 0 0 0 0
bits: 4 #randoms: 10000 stringlen: 11 distribution: 0 0 0 196 2080 4381 2869 474 0 0 0 0 0 0 0 0
bits: 4 #randoms: 10000 stringlen: 12 distribution: 0 0 0 0 0 0 0 0 0 20 564 2585 3926 2386 501 18

Increasing the number of bits helps (10 bits is the max that fits on an mIRC line), and increasing the string length also helps, but even when the string is as long as 100 characters the distribution is uneven. As you shorten the hash, the distribution gets worse:

bits: 2 #randoms: 10000 stringlen: 8 distribution: 10000 0 0 0
bits: 2 #randoms: 10000 stringlen: 9 distribution: 0 2265 7735 0

If you edit the above alias to use the above example of $crchash instead of $hash, the distribution is much more even for all lengths:

bits: 4 #randoms: 10000 stringlen: 8 distribution: 650 575 621 659 583 605 626 593 624 655 638 620 666 595 642 648
bits: 4 #randoms: 10000 stringlen: 9 distribution: 620 612 637 625 622 614 650 627 628 636 596 636 617 662 634 584
bits: 4 #randoms: 10000 stringlen: 10 distribution: 591 611 640 615 595 599 645 605 599 684 620 572 641 639 668 676
bits: 4 #randoms: 10000 stringlen: 11 distribution: 630 626 587 624 654 629 581 637 591 653 660 616 627 607 612 666
bits: 4 #randoms: 10000 stringlen: 12 distribution: 645 619 604 608 628 596 602 655 622 617 591 651 591 663 682 626

$crc is not of cryptographic quality, but at least it has a good distribution. But a good distribution is not proof of a good hash, since even a repeating pattern of 1-through-10 has that.

Compatibility

Added: mIRC v5.4
Added on: 23 Jun 1998
Note: Unless otherwise stated, this was the date of original functionality.
Further enhancements may have been made in later versions.

See also