(Created page with "{{mirc title|$isutf Identifier}}'''$isutf''' returns the status of the text where 0 = not utf8 (contains invalid utf8 sequence), 1 = seems to be plain text, 2 = seems to conta...") |
Maroonbells (talk | contribs) |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | {{mirc title|$isutf Identifier}}'''$isutf''' returns the status of the text where 0 = not utf8 (contains invalid utf8 sequence), 1 = seems to be plain text, 2 = seems to contain valid utf8 | + | {{mirc title|$isutf Identifier}}'''$isutf''' returns the status of the text where 0 = not utf8 (contains invalid utf8 sequence), 1 = seems to be plain text, 2 = seems to contain valid utf8. In older 6.x versions, the result can change depending on which UTF8 setting is active in the /font dialog. |
− | |||
== Synopsis == | == Synopsis == | ||
Line 6: | Line 5: | ||
== Parameters == | == Parameters == | ||
− | |||
* '''text''' - The text you want the status of | * '''text''' - The text you want the status of | ||
Line 15: | Line 13: | ||
<source lang="mIRC">//echo -a $isutf(é) $isutf($utfencode(é)) $isutf(plain)</source> | <source lang="mIRC">//echo -a $isutf(é) $isutf($utfencode(é)) $isutf(plain)</source> | ||
+ | Note how this indicates whether the text contains the UTF8 codepoints of a UTF8 sequence, not whether the input is a UTF8 string, which all %strings in a unicode-aware client should be, which is why the next command returns "0 2". | ||
+ | |||
+ | <source lang="mIRC">//echo -a $isutf($chr(233)) vs $isutf($chr(195) $+ $chr(169))</source> | ||
+ | |||
+ | If you need to test if a &binvar contains a UTF8 string, you can take advantage of the $regsubex feature where it can output a string into a binvar. If the input is $bvar(&var1,1-).text, you can test whether &var2 is created as an exact replica. Note how $isutf returns 0 for both binvars. On the other hand, the isbinvarutf alias returns 2 for &v1 which contains a UTF8 byte sequence, but returns 0 for &v2 because the cloned UTF8 output from $regsubex was not the same bytes as the original. Note that there's a limit to how long of a binvar can be tested using this method, because $regsubex only permits the $2 string to contain more than approximately $maxlenl *bytes* even when that string has fewer than 4000 UTF8 *characters*. | ||
+ | |||
+ | <source lang="mIRC"> | ||
+ | //bset &v1 1 195 169 | bset &v2 1 233 | var -s %a1 $bvar(&v1,1-).text , %a2 $bvar(&v2,1-).text | echo -a $isutf(%a1) $isutf(%a2) vs $isbinvarutf(&v1) $isbinvarutf(&v2) | ||
+ | |||
+ | alias isbinvarutf { | ||
+ | if ($bvar($1,0) == 0) return 0 | var %len1 $v1 | ||
+ | noop $regsubex(foo,$bvar($1,1-).text,,,&tempvar2) | ||
+ | var %len2 $bvar(&tempvar2,0) | if (%len1 != %len2) return 0 | ||
+ | if ($calc(%len1 + %len2) < 2000) { if ($bvar($1,1-) == $bvar(&tempvar2,1-)) return 2 | else return 0 } | ||
+ | else { if ($sha256($1,1) == $sha256(&tempvar2,1)) return 2 | else return 0 } | ||
+ | } | ||
+ | </source> | ||
== Compatibility == | == Compatibility == | ||
{{mIRC compatibility|6.17}} | {{mIRC compatibility|6.17}} | ||
Line 21: | Line 36: | ||
{{mIRC|$utfencode}} | {{mIRC|$utfencode}} | ||
{{mIRC|$utfdecode}} | {{mIRC|$utfdecode}} | ||
− | |||
− |
Latest revision as of 01:33, 5 October 2020
$isutf returns the status of the text where 0 = not utf8 (contains invalid utf8 sequence), 1 = seems to be plain text, 2 = seems to contain valid utf8. In older 6.x versions, the result can change depending on which UTF8 setting is active in the /font dialog.
Synopsis[edit]
$isutf(text)
Parameters[edit]
- text - The text you want the status of
Properties[edit]
None
Example[edit]
//echo -a $isutf(é) $isutf($utfencode(é)) $isutf(plain)
Note how this indicates whether the text contains the UTF8 codepoints of a UTF8 sequence, not whether the input is a UTF8 string, which all %strings in a unicode-aware client should be, which is why the next command returns "0 2".
//echo -a $isutf($chr(233)) vs $isutf($chr(195) $+ $chr(169))
If you need to test if a &binvar contains a UTF8 string, you can take advantage of the $regsubex feature where it can output a string into a binvar. If the input is $bvar(&var1,1-).text, you can test whether &var2 is created as an exact replica. Note how $isutf returns 0 for both binvars. On the other hand, the isbinvarutf alias returns 2 for &v1 which contains a UTF8 byte sequence, but returns 0 for &v2 because the cloned UTF8 output from $regsubex was not the same bytes as the original. Note that there's a limit to how long of a binvar can be tested using this method, because $regsubex only permits the $2 string to contain more than approximately $maxlenl *bytes* even when that string has fewer than 4000 UTF8 *characters*.
//bset &v1 1 195 169 | bset &v2 1 233 | var -s %a1 $bvar(&v1,1-).text , %a2 $bvar(&v2,1-).text | echo -a $isutf(%a1) $isutf(%a2) vs $isbinvarutf(&v1) $isbinvarutf(&v2) alias isbinvarutf { if ($bvar($1,0) == 0) return 0 | var %len1 $v1 noop $regsubex(foo,$bvar($1,1-).text,,,&tempvar2) var %len2 $bvar(&tempvar2,0) | if (%len1 != %len2) return 0 if ($calc(%len1 + %len2) < 2000) { if ($bvar($1,1-) == $bvar(&tempvar2,1-)) return 2 | else return 0 } else { if ($sha256($1,1) == $sha256(&tempvar2,1)) return 2 | else return 0 } }
Compatibility[edit]
Added: mIRC v6.17
Added on: 17 Feb 2006
Note: Unless otherwise stated, this was the date of original functionality.
Further enhancements may have been made in later versions.