From WikiChip
Editing mirc/hash tables
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 12: | Line 12: | ||
|} | |} | ||
− | mIRC provides facilities for manipulating the table and the values in | + | mIRC provides facilities for manipulating the table and the values in variety of ways. |
== General Details == | == General Details == | ||
Line 18: | Line 18: | ||
Hash tables, unlike INI files, are stored completely in memory and are never written to disk (unless the /hsave command is used), making them much faster when it comes to storing and retrieving information. The performance gain is much more obvious with a large amount of item/data pairs. | Hash tables, unlike INI files, are stored completely in memory and are never written to disk (unless the /hsave command is used), making them much faster when it comes to storing and retrieving information. The performance gain is much more obvious with a large amount of item/data pairs. | ||
− | '''Note:''' Because hash tables are only in memory, it must be saved to a | + | '''Note:''' Because hash tables are only in memory, it must be saved to a file if mIRC has to exit for any reason. You can reload the table from a file at a later period. |
== Creating a table == | == Creating a table == | ||
Line 28: | Line 28: | ||
;hmake also have an -s switch which prints debug info</syntaxhighlight> | ;hmake also have an -s switch which prints debug info</syntaxhighlight> | ||
− | If you don't specify the number of buckets | + | If you don't specify the number of buckets, the default is used which is 100. Assuming that you are going to look up a specific item by name using {{mIRC|$hget}}, then generally speaking the number of buckets should be decided based on the following equation: |
<math> | <math> | ||
Line 34: | Line 34: | ||
</math> | </math> | ||
− | For example: a table with | + | For example: a table with 100 buckets is optimal for 78 items. For 1000 items, 1300 buckets is best. '''Note:''' The maximum valid number of buckets is 10000. |
− | Or to put this another way, the optimum number of | + | Or to put this another way, the optimum number of slots is 1.282x the number of items you are going to store in the hash table. |
− | '''Note:''' | + | '''Note:''' It is unclear where the value of 0.78 is derived from, but a likely reason is that this is the optimum number to try to ensure that each slot only has a single entry assigned to it. See below for a technical explanation of how mIRCs hash tables work and why you might want to have only a single item per slot. |
== Adding Items == | == Adding Items == | ||
Line 45: | Line 45: | ||
<syntaxhighlight lang="mirc">hadd <table_name> <item> <data> | <syntaxhighlight lang="mirc">hadd <table_name> <item> <data> | ||
− | |||
hadd -b <table_name> <item> <&bvar> | hadd -b <table_name> <item> <&bvar> | ||
An item can be added to the table with null data: | An item can be added to the table with null data: | ||
hadd <table_name> <item> | hadd <table_name> <item> | ||
− | |||
− | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Let's consider a table of favorite colors: | Let's consider a table of favorite colors: | ||
− | <syntaxhighlight lang="mirc"> | + | <syntaxhighlight lang="mirc">/hadd -ms colors John Blue |
− | + | /hadd -s colors Mary Green | |
− | /hadd - | + | /hadd -s colors Gary Orange</syntaxhighlight> |
− | /hadd -s colors | ||
− | /hadd -s colors Gary Orange | ||
− | </syntaxhighlight> | ||
− | + | The code above will produce the following result: | |
− | <pre> | + | <pre>* Made hash table 'colors' (100) |
− | * Made hash table 'colors' ( | + | * Added item 'John' to hash table 'colors' |
* Added item 'Mary' to hash table 'colors' | * Added item 'Mary' to hash table 'colors' | ||
− | + | * Added item 'Gary' to hash table 'colors'</pre> | |
− | |||
− | * Added item 'Gary' to hash table 'colors' | ||
− | </pre> | ||
{| class="wikitable" | {| class="wikitable" | ||
|+ "Colors" Hash Table | |+ "Colors" Hash Table | ||
! Item !! Data | ! Item !! Data | ||
− | |||
− | |||
|- | |- | ||
|John || Blue | |John || Blue | ||
|- | |- | ||
− | | | + | |Mary || Green |
|- | |- | ||
|Gary || Orange | |Gary || Orange | ||
|} | |} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Deleting Items == | == Deleting Items == | ||
Line 152: | Line 79: | ||
<syntaxhighlight lang="mirc">hdel <table_name> <item> | <syntaxhighlight lang="mirc">hdel <table_name> <item> | ||
hdel -w <table_name> <wild_item> | hdel -w <table_name> <wild_item> | ||
− | ;hdel has a -s switch which is the same as / | + | ;hdel has a -s switch which is the same as /hmake's</syntaxhighlight> |
+ | |||
+ | A {{mirc|wildcard}} pattern for the item can be specified to delete multiple items at once. If we go back to our example: | ||
− | + | <syntaxhighlight lang="mirc">/hdel colors Mary</syntaxhighlight> | |
− | |||
Will leave our table looking like this: | Will leave our table looking like this: | ||
Line 162: | Line 90: | ||
|+ "Colors" Hash Table | |+ "Colors" Hash Table | ||
! Item !! Data | ! Item !! Data | ||
− | |||
− | |||
|- | |- | ||
|John || Blue | |John || Blue | ||
Line 170: | Line 96: | ||
|} | |} | ||
− | + | == Value retrieval == | |
+ | To get a data value associated with a given item we will use the $hget identifier which has the following syntax: | ||
+ | |||
+ | <syntaxhighlight lang="mirc">$hget(<table_name>, <item>)</syntaxhighlight> | ||
+ | |||
+ | For example, if we were to check what is John's favorite color from our table; we will use the following piece of code: | ||
+ | |||
+ | <syntaxhighlight lang="mirc">//echo -a $hget(colors, John) | ||
+ | ;Blue</syntaxhighlight> | ||
− | + | The $hget identifier can also be used to check if a table exists using the following syntax: | |
− | |||
− | |||
− | + | <syntaxhighlight lang="mirc">$hget(<table_name>) | |
+ | ; returns $null if the table does not exist</syntaxhighlight> | ||
== Saving/Loading Hash Table to/from file == | == Saving/Loading Hash Table to/from file == | ||
Line 186: | Line 119: | ||
<syntaxhighlight lang="mirc">/hsave <table_name> <filename> | <syntaxhighlight lang="mirc">/hsave <table_name> <filename> | ||
− | ; The -s switch | + | ; The -s switch displays debug information |
; The -a switch will append to an existing file, instead of the default overwriting | ; The -a switch will append to an existing file, instead of the default overwriting | ||
; The -i switch will create an ini file | ; The -i switch will create an ini file | ||
; The -n switch saves the file containing only the data and not the item names. | ; The -n switch saves the file containing only the data and not the item names. | ||
; The -u switch avoids skipping temporary items created with /hadd's -uN switch. | ; The -u switch avoids skipping temporary items created with /hadd's -uN switch. | ||
− | ; The -b switch will treat the file as a binary file, making it possible to save things like carriage returns and line feeds. | + | ; The -b switch will treat the file as a binary file, making it possible to save things like carriage returns and line feeds.</syntaxhighlight> |
− | |||
− | </syntaxhighlight> | ||
If we wanted to save our little colors table to an INI file, we could use the following piece of code.: | If we wanted to save our little colors table to an INI file, we could use the following piece of code.: | ||
Line 200: | Line 131: | ||
;colors.ini will have: | ;colors.ini will have: | ||
; [hashtable] | ; [hashtable] | ||
− | |||
− | |||
; Gary=Orange | ; Gary=Orange | ||
− | ; John=Blue | + | ; John=Blue</syntaxhighlight> |
− | </syntaxhighlight> | ||
− | |||
− | |||
To load a hash table we use the following syntax: | To load a hash table we use the following syntax: | ||
− | <syntaxhighlight lang="mirc">; NOTE: The table must exists. I.e. you must have called /hmake first | + | <syntaxhighlight lang="mirc">; NOTE: The table must exists. I.e. you must have called /hmake first |
/hload <table_name> <filename> | /hload <table_name> <filename> | ||
− | ; The -s switch | + | ; The -s switch displays debug information |
; The -i switch will read from an ini file containing lines of item=data | ; The -i switch will read from an ini file containing lines of item=data | ||
; The -n switch interprets the file as if it were /hsave'ed with the -n switch to contain only data, assigning item names as sequential integers beginning with 1. | ; The -n switch interprets the file as if it were /hsave'ed with the -n switch to contain only data, assigning item names as sequential integers beginning with 1. | ||
; The -m[N] switch creates the hashtable if it does not already exist, optionally giving it N buckets different than the default 100 buckets. | ; The -m[N] switch creates the hashtable if it does not already exist, optionally giving it N buckets different than the default 100 buckets. | ||
− | ; The -b switch will treat the file as a binary file | + | ; The -b switch will treat the file as a binary file, making it possible to save things like carriage returns and line feeds.</syntaxhighlight> |
− | |||
− | </syntaxhighlight> | ||
To load the table we've just saved we would use the following code: | To load the table we've just saved we would use the following code: | ||
<syntaxhighlight lang="mirc">/hload -i Colors colors.ini</syntaxhighlight> | <syntaxhighlight lang="mirc">/hload -i Colors colors.ini</syntaxhighlight> | ||
− | |||
− | |||
== Deleting a Table == | == Deleting a Table == | ||
Line 231: | Line 153: | ||
<syntaxhighlight lang="mirc">/hfree <table_name> | <syntaxhighlight lang="mirc">/hfree <table_name> | ||
/hfree -w <*wild*table*> | /hfree -w <*wild*table*> | ||
− | ;hfree has a -s switch which | + | ;hfree has a -s switch which is the same as /hmake's</syntaxhighlight> |
With the -w switch you can specify a {{mirc|wildcard}} pattern. All matching tables will be freed. If you already deleted a table and try to delete it again, "hfree tablename" halts your script. You must either use $hget(tablename) to verify the table's existence, or use -w without a wildcard. "hfree -w tablename_without_wildcards" | With the -w switch you can specify a {{mirc|wildcard}} pattern. All matching tables will be freed. If you already deleted a table and try to delete it again, "hfree tablename" halts your script. You must either use $hget(tablename) to verify the table's existence, or use -w without a wildcard. "hfree -w tablename_without_wildcards" | ||
+ | |||
+ | == Iterating Over a Hash Table == | ||
+ | The $hget identifier can be used to iterate over the hash table. The syntax is: | ||
+ | |||
+ | <syntaxhighlight lang="mirc">; Total Number of items in the table: | ||
+ | $hget(<table_name>, 0).item | ||
+ | ; Get the Nth Item | ||
+ | $hget(<table_name>, <Nth>).item | ||
+ | ; Get the value associated with the Nth Item | ||
+ | $hget(<table_name>, <Nth>).data</syntaxhighlight> | ||
+ | |||
+ | '''Note:''' Iterating over a hash table like this is an inefficient way to retrieve values and items. See the explanation below for why mIRC will iterate over the hash table for every {{mIRC|$hget}} - so the time required per lookup will increase linearly with the table size and the time for the script to iterate over the entire hash table will be proportional to the square of the table size. If it is possible to do so, then it's best to get a value using its item name. | ||
+ | |||
+ | An example of looping over every value in our Colors table will look like this: | ||
+ | |||
+ | <syntaxhighlight lang="mirc">Alias print_fav_colors { | ||
+ | var %i = 1 | ||
+ | echo Colors Table: | ||
+ | ; iterate over each item | ||
+ | while ($hget(Colors, %i).item) { | ||
+ | ; print the item/value pair | ||
+ | echo -a %i $+ ) $v1 => $hget(Colors, $v1) | ||
+ | inc %i | ||
+ | } | ||
+ | }</syntaxhighlight> | ||
+ | |||
+ | The execution of the alias (/print_fav_colors) will produce the following result: | ||
+ | |||
+ | <pre>Colors Table: | ||
+ | 1) Gary => Orange | ||
+ | 2) John => Blue</pre> | ||
== Searching for a Item and value pair == | == Searching for a Item and value pair == | ||
Line 252: | Line 205: | ||
<syntaxhighlight lang="mirc">//echo -a $hfind(Colors, *ary*, 1, w)</syntaxhighlight> | <syntaxhighlight lang="mirc">//echo -a $hfind(Colors, *ary*, 1, w)</syntaxhighlight> | ||
− | Which will return | + | Which will return "Gary". |
'''Note 1:''' Using a non-hashed method for finding an item or data using {{mIRC|$hfind}} is an inefficient way to retrieve values and items. See the explanation below for why mIRC will iterate over the hash table for every {{mIRC|$hfind}} - so time required per lookup will increase linearly with the table size. If it is possible to do so, then it's best to get a value using {{mIRC|$hget}} using its item name. | '''Note 1:''' Using a non-hashed method for finding an item or data using {{mIRC|$hfind}} is an inefficient way to retrieve values and items. See the explanation below for why mIRC will iterate over the hash table for every {{mIRC|$hfind}} - so time required per lookup will increase linearly with the table size. If it is possible to do so, then it's best to get a value using {{mIRC|$hget}} using its item name. | ||
Line 262: | Line 215: | ||
# When you create a hash table, it is created with a defined number of "slots" (or "buckets"). | # When you create a hash table, it is created with a defined number of "slots" (or "buckets"). | ||
− | # When you add an item to the table using {{mIRC|/hadd}}, a hash algorithm calculates a | + | # When you add an item to the table using {{mIRC|/hadd}}, a hash algorithm calculates a slot number based on the item name. Each slot holds a linked-list of items whose names map to that slot number, with each new item being inserted at the beginning of the list. |
− | # When you look up an item by name using {{mIRC|$hget}}, then the same hash algorithm is used to locate the | + | # When you look up an item by name using {{mIRC|$hget}}, then the same hash algorithm is used to locate the slot it will be stored under and then the linked-list in that slot is searched sequentially for the item. The purpose of hashing is to support this kind of lookup on potentially large tables with fast performance. |
− | # When you get an item by position using {{mIRC|$hget|$hget(table,position)}}, or use $hfind to search data or search using {{mirc|wildcard|wildcards}} or regular expressions, then the | + | # When you get an item by position using {{mIRC|$hget|$hget(table,position)}}, or use $hfind to search data or search using {{mirc|wildcard|wildcards}} or regular expressions, then the has algorithm cannot be used to identify the correct slot and instead mIRC has to iterate across part or all of the hash table to count or to find the record you want. |
− | If your hash table has a small number of | + | If your hash table has a small number of slots compared to the number of records, then each slot will have a large number of records: |
− | * For a lookup of an existing item, on average mIRC will have to iterate over 50% of the | + | * For a lookup of an existing item, on average mIRC will have to iterate over 50% of the entries before locating the one you want |
− | * If you try to find a non-existent item, mIRC will need to iterate over the whole | + | * If you try to find a non-existent item, mIRC will need to iterate over the whole list before determining that the item doesn't exist. |
As you might imagine, mIRC iterating over a large number of hash table entries to find the item needed is CPU intensive and mIRC might start to feel unresponsive. | As you might imagine, mIRC iterating over a large number of hash table entries to find the item needed is CPU intensive and mIRC might start to feel unresponsive. | ||
− | So for look-ups by item name, the best performance will be achieved when mIRC's hashing algorithm points to a | + | So for look-ups by item name, the best performance will be achieved when mIRC's hashing algorithm points to a slot with a single table entry (or failing that a small number of entries). Worst case scenario is if your hash table has only a single slot, then all entries are stored in a single linked-list and every look-up needs to be iterated. On the other hand, if you have a large number of slots (much greater than the number of items in the hash table), then the likelihood is that every item will be stored in its own slot, so the hash function will take you to a slot with a single entry, and no iteration will be needed to find the item. |
− | All that said, even with a large number of | + | All that said, even with a large number of slots, you cannot guarantee that every item in the table will have a unique hash / slot number. As an analogy, consider a class of 30 students. What is the probability that all students have birthdays on different days of the year? This is equivalent to asking whether a hash table with 365 slots and 30 entries will have every entry using a different slot. It turns out that in a class of 30 students there is significantly more than 50% probability that at least two students will share a birthday - indeed it only takes 23 students for the probability to be more than 50%. This seems weird - but for the mathematically inclined, the probability can be calculated by determining what the probability is that M students have all different birthdays: |
The first student can have any birthday. | The first student can have any birthday. | ||
The second student can have 364 of 365 days and still be different. | The second student can have 364 of 365 days and still be different. | ||
− | The third student can have 363 | + | The third student can have 363 or 365 days and still be different. |
So the probability that M students all have different birthdays is therefore: | So the probability that M students all have different birthdays is therefore: | ||
Line 291: | Line 244: | ||
</math> | </math> | ||
− | Returning to hash tables and | + | Returning to hash tables and slots, the equivalent formula for a table with M entries and N slots is: |
<math> | <math> | ||
Line 299: | Line 252: | ||
</math> | </math> | ||
− | Using the student birthday example for simplicity and relating it to hash tables and | + | Using the student birthday example for simplicity and relating it to hash tables and slots, if we turn it on its head then we can say that if we have a hash table holding 23 entries, and we want to have a probability that each entry has its own slot > 50%, then we need to have more than 365 slots. '''I bet you weren't expecting that!!''' Fortunately the performance overhead of iterating over a relatively short linked-list is also small, and equally fortunately a slot only uses 4-bytes (which is very small indeed compared to the size of a table entry, which consists of the item name and the data and the overhead of storing these and linking them into a list). Indeed mIRC's maximum slot size is 10,000, requiring c. 40KB of memory - which in today's PCs with several GB of memory is relatively small. |
− | '''Summary:''' If you are doing any lookups by item name on a frequent basis on a large table, then you should use the largest sensible | + | '''Summary:''' If you are doing any lookups by item name on a frequent basis on a large table, then you should use the largest sensible slot size to avoid mIRC iterating over long linked-lists when doing these lookups. |
'''HOWEVER...''' | '''HOWEVER...''' | ||
− | Not all hash table look-ups are able to use the hash to calculate the correct | + | Not all hash table look-ups are able to use the hash to calculate the correct slot - only look-ups by item name. If you want to access hash table entries by position using {{mIRC|$hget|$hget(table,n)}}, or if you want to use {{mIRC|$hfind}}, then mIRC is going to have to iterate over a significant proportion (or all) of a hash table regardless of the number of slots that you define. Indeed, if you are never going to look-up by item-name, you might as well save memory and use a single slot. |
− | '''Summary:''' If you are doing only lookup by position or | + | '''Summary:''' If you are doing only lookup by position or using $hfind, then you should use a slot size of 1 to save memory and avoid the small overhead of iterating over empty slots. |
Finally, if you have a large hash table (perhaps several thousand records) that you want to search flexibly, then you might wish to consider whether something like mIRC SQLlite might suit your needs better. | Finally, if you have a large hash table (perhaps several thousand records) that you want to search flexibly, then you might wish to consider whether something like mIRC SQLlite might suit your needs better. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== See Also == | == See Also == | ||
More detailed information can be found at the pages for each hashtable-related command and identifier. | More detailed information can be found at the pages for each hashtable-related command and identifier. |