From WikiChip
Editing mirc/identifiers/$hash

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.

Latest revision Your text
Line 1: Line 1:
 
{{mirc title|$hash Identifier}}
 
{{mirc title|$hash Identifier}}
The '''$hash''' identifier calculates a simple hash of the supplied text. The hash is shown as an integer in the range from 0 to 2^N-1 where N is the number of bits ranging from 2-32. Except for compatibility with legacy scripts, you should probably avoid using this hash for reasons explained in the Notes section. It's possible that $hash() was once the method used by /hadd to put an item into one of the hash table's "buckets", but that was not the case at the point when it was changed to use 'modified FNV-1A'. Alternate solutions are suggested below.
+
The '''$hash''' identifier calculates a simple hash of the supplied text. The hash is shown as a decimal number in the range from 0 to 2^N-1 where N is the number of bits ranging from 2-32. Except for legacy scripts, you should probably avoid using this hash for reasons explained in the Notes section. It's possible that $hash() was once the method used by /hadd to put an item into one of the hash table's "buckets", but that was not the case at the point when it was changed to use 'modified FNV-1A'.
  
 
== Synopsis ==
 
== Synopsis ==
Line 9: Line 9:
 
{{ArgsList
 
{{ArgsList
 
| text | text string or %variable to be hashed
 
| text | text string or %variable to be hashed
| B | Bit length for the returned hash. $hash returns $null if B is a number outside the range 2-32, and returns the original string if the B parameter isn't used. Returned hash is an integer in the range 0 to 2^B -1.
+
| B | Bit length for the returned hash. $hash returns $null if B is a number outside the range 2-32, and returns the original string if the B parameter isn't used. Returned hash is a decimal number in the range 0 to 2^B -1.
 
}}
 
}}
 
== Properties ==
 
== Properties ==
Line 47: Line 47:
 
returns: 4952B000 vs 4952B100</source>
 
returns: 4952B000 vs 4952B100</source>
  
* 4. It's easy to create duplicate hashes from different inputs:
+
* 4. It's easy to create duplicate hashes from different inputs, especially when the length of the inputs are a multiple of 3:
  
 
<source lang="mIRC">//echo -a $base($hash(ABCDEF,32),10,16) / $base($hash(ABDDEE,32),10,16)</source>
 
<source lang="mIRC">//echo -a $base($hash(ABCDEF,32),10,16) / $base($hash(ABDDEE,32),10,16)</source>
  
These collisions are so easy to create that they can be scripted very simply. What you do is change one of the characters of the string by changing the $asc() codepoint of that character by N in either direction. You then find another character in that string whose distance from the changed character is a multiple of 3. You then change that 2nd character's $asc() codepoint in the opposite direction by the same N. So, if someone knows your script is depending on $hash($nick,32) being different for everyone in the channel, they can use this to easily change their nick to have a matching hash as someone else's nick. For example, take the nick 'mastadon', where you can subtract 3 from the 'm' to begin the nick with the letter 'j'. Since the characters 't' and 'o' are 3 and 6 characters distant from the 'm', you can then add 3 to either of their codepoints, and 'jaswadon' or 'jastadrn' will have the same $hash result. In fact, you can split these changes and change the 't' by +2 and the 'o' by +1, and 'jasvadpn' also has the matching $hash.<br>
+
* 5. One of the properties of a good hash function is that it should be reasonably fast. However $hash is excrutiatingly slow compared to $crc.
 
 
* 5. One of the properties of a good hash function is that it should be reasonably fast. However $hash is excruciatingly slow compared to $crc.
 
 
<source lang="mIRC">
 
<source lang="mIRC">
 
//var %string $str(a,8192), %reps 1000, %t $ticks | while (%reps) { noop $crc(%string,0) | dec %reps } | echo -a ticks: $calc($ticks - %t)
 
//var %string $str(a,8192), %reps 1000, %t $ticks | while (%reps) { noop $crc(%string,0) | dec %reps } | echo -a ticks: $calc($ticks - %t)
Line 60: Line 58:
 
As of v7.56, on a computer where the 1000 repetitions takes $crc approx 1/10th of a second, the same work using $hash takes longer than 30 seconds.
 
As of v7.56, on a computer where the 1000 repetitions takes $crc approx 1/10th of a second, the same work using $hash takes longer than 30 seconds.
  
Instead of using $hash, you would be better off using other substitutes. For example, if you need it to be an integer with a variable 1-32 number of bits, use $crc then use $base to convert from base16 hex to base10 integer then reduce the number of bits:
+
Instead of using $hash, you would be better off using other substitutes. For example, if you need it to be a decimal number with a variable 1-32 number of bits, use $crc then convert to decimal then reduce the number of bits:
  
 
<source lang="mIRC">alias crchash { return $calc( $base($crc($1,0),16,10) % (2^$iif($$2 isnum 1-32,$gettok($2,1,46),32)) ) }
 
<source lang="mIRC">alias crchash { return $calc( $base($crc($1,0),16,10) % (2^$iif($$2 isnum 1-32,$gettok($2,1,46),32)) ) }
Line 68: Line 66:
 
</source>
 
</source>
  
If all that's needed is a reasonably unique fast string and the hex output from $crc is ok, then can simply replace $hash(parameter,32) with $crc(parameter,0). While $crc does not provide crypto level ability to make it difficult to create collisions, it does have the property that trivially related strings do not have identical hashes. Based on the 'birthday paradox', it should be expected that out of 2^(32/2) (65536) different strings, that there should be a 50/50 chance of a matching hash pair among them. However because only the 1st 24 bits of the result are ever non-zero, that reduces the number of strings needed for a collision to 2^(24/2) (4096), and since strings actually being hashed tend to be related, the odds of collisions are even higher.<br>
+
If the hash needs to be based on a crypto-level hash, or needs more than 32 bits, use up-to-52 bits from $sha1 or $sha512 instead of from $crc.
 
 
If you want to reduce the chance of collision even further below the ability of the 32-bit result from $crc, beginning with v7.68 the $crc64 identifier can return a 64-bit CRC variant similar to the 32-bit result from $crc, which means the birthday collision property means there would need to be 4 billion strings in order to have a 50/50 chance of a collision.<br>
 
 
 
If the hash needs to be based on a crypto-level hash, or needs more than 32 bits, use up-to-52 bits from $sha1 or $sha512 instead of from $crc. Note that beginning with v7.72 the .bigfloat mode allows accuracy for integers above 2^53, but that can carry extra time cost.
 
  
 
<source lang="mIRC">alias sha1hash {
 
<source lang="mIRC">alias sha1hash {
Line 156: Line 150:
 
bits: 4 #randoms: 10000 stringlen: 8 input range a-z distribution: 624 618 617 627 610 638 624 626 602 635 612 650 632 600 627 658</source>
 
bits: 4 #randoms: 10000 stringlen: 8 input range a-z distribution: 624 618 617 627 610 638 624 626 602 635 612 650 632 600 627 658</source>
  
$crc is not of cryptographic quality, but at least it has a good distribution, and hash functions don't always need a 1-way feature, they just need to be fast. A good distribution is not proof of a good hash, since even a repeating pattern of 1-through-10 has that.
+
$crc is not of cryptographic quality, but at least it has a good distribution, and hash functions don't need a 1-way feature, they just need to be fast. A good distribution is not proof of a good hash, since even a repeating pattern of 1-through-10 has that.
 
== Compatibility ==
 
== Compatibility ==
 
{{mIRC compatibility|5.4}}
 
{{mIRC compatibility|5.4}}
 
== See also ==
 
== See also ==
 
* {{mIRC|$crc}}
 
* {{mIRC|$crc}}
* {{mIRC|$crc64}}
 
 
* {{mIRC|$md5}}
 
* {{mIRC|$md5}}
 
* {{mIRC|$sha1}}
 
* {{mIRC|$sha1}}

Please note that all contributions to WikiChip may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see WikiChip:Copyrights for details). Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)