From WikiChip
Editing mirc/unicode
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
This page supports semantic in-text annotations (e.g. "[[Is specified as::World Heritage Site]]") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help pages.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
{{mirc title|Unicode}} | {{mirc title|Unicode}} | ||
− | This page does not attempt to describe what [[Unicode]] | + | This page does not attempt to describe what is [[Unicode]] but rather how it works around mIRC. The technical terms are omitted on purpose. |
− | + | Before Unicode (mIRC 7.x), mIRC supported code pages, handling various language used in the world, code pages are encoding on 8 bits (byte), 8 bits can be used to represent 256 values, mIRC also supported others encoding for Japanese system for example. Basically, code pages are all based on ASCII, which defines 128 characters, assigned to the first 128 values represented in 8 bits, and then each code page adds the required characters for the language, é for the french or a Greek letter for Greek people. | |
− | + | You can see Unicode as a new codepage, but which defines 1,114,112 characters, including all languages. This is much better for IRC, which can be used from all over the world. | |
− | Unicode | + | They are different ways to implement the handling of Unicode in a application, mIRC uses the utf16 encoding internally, before it was handled using US-ASCII. |
− | + | Two mains reasons for this: | |
− | * The most frequently used | + | * The most frequently used character, the first 65635 ones, can be stored with 16 bits, as a result, routine dealing with Unicode are going to be faster. |
− | * It uses less memory than | + | * It uses less memory than utf32. |
− | Drawback: | + | Drawback: each character in a script is depicted as a 16 bits unit, {{mIRC|$asc}} and {{mIRC|$chr}} cannot be used with characters over 65635, but you can form those others characters in your script by combining two 16 bits characters together: |
<source lang="mIRc"> | <source lang="mIRc"> | ||
− | //var -s %a = $chr(55384), %b = $chr(56320) | echo -a %a $+ %b | + | //var -s %a = $chr(55384), %b = $chr(56320) | echo -a %a $+ %b |
− | 𦀀 | + | 𦀀 |
</source> | </source> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | You can also express those characters with their utf8 representation using {{mIRC|$utfdecode}}, which decode utf8: | |
− | + | <source lang="mIRc"> | |
− | |||
− | |||
− | <source lang=" | ||
;you can use utf8 to form the character 65536 for example, which is four bytes in utf8 (f0 90 80 80): | ;you can use utf8 to form the character 65536 for example, which is four bytes in utf8 (f0 90 80 80): | ||
//var -s %a $utfdecode($chr($base(f0,16,10)) $+ $chr($base(90,16,10)) $+ $chr($base(80,16,10)) $+ $chr($base(80,16,10))) | //var -s %a $utfdecode($chr($base(f0,16,10)) $+ $chr($base(90,16,10)) $+ $chr($base(80,16,10)) $+ $chr($base(80,16,10))) | ||
Line 43: | Line 33: | ||
<source lang="mIRc"> | <source lang="mIRc"> | ||
− | //var -s %a $utfencode( | + | //var -s %a $utfencode(é) |
− | + | é | |
</source> | </source> | ||
− | |||
The scripting language still somewhat support code pages, you can decode text to utf8 while the bytes in the text are interpreted in the given code page. | The scripting language still somewhat support code pages, you can decode text to utf8 while the bytes in the text are interpreted in the given code page. | ||
Line 75: | Line 64: | ||
'''Note''': GDI charsets 1 and 255 are system dependent and are therefore expected to return different results across different machines. Values not on the table are treated as a reference to DEFAULT_CHARSET, equivalent to using C = 1. | '''Note''': GDI charsets 1 and 255 are system dependent and are therefore expected to return different results across different machines. Values not on the table are treated as a reference to DEFAULT_CHARSET, equivalent to using C = 1. | ||
− | For example, if you want to get the text (FROM GREEK TO UTF8), which used the ISO-8859-7 (GREEK) encoding for | + | For example, if you want to get the text (FROM GREEK TO UTF8), which used the ISO-8859-7 (GREEK) encoding for greek letters, in utf8, you need to encode that to utf8, interpreting the bytes as per in the GREEK code page, and then to decode that to utf8: $utfdecode($utfencode(text,161)) |
If you want to send the text in GREEK over IRC, mIRC will encode the bytes internally so you must encode the text in utf8, and then decode to utf8, interpreting the bytes as per in the GREEK code page: /raw -n privmsg #chan $utfdecode($utfencode(text),161) | If you want to send the text in GREEK over IRC, mIRC will encode the bytes internally so you must encode the text in utf8, and then decode to utf8, interpreting the bytes as per in the GREEK code page: /raw -n privmsg #chan $utfdecode($utfencode(text),161) | ||
Line 89: | Line 78: | ||
And this is happening pretty much everywhere. | And this is happening pretty much everywhere. | ||
− | {{mIRC|/raw|/raw -n}} can be used for IRC, it sends the data to the server without | + | {{mIRC|/raw|/raw -n}} can be used for IRC, it sends the data to the server without encode the characters in the range 0-255 to utf8. |
{{mIRC|/sockwrite|/sockwrite -u}} can be used to the same effect, won't encode characters in the range 0-255 to utf8. | {{mIRC|/sockwrite|/sockwrite -u}} can be used to the same effect, won't encode characters in the range 0-255 to utf8. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
[[Category:mIRC|unicode]] | [[Category:mIRC|unicode]] |