(→Reading Incoming Data) |
(→Debugging) |
||
Line 235: | Line 235: | ||
=== Debugging === | === Debugging === | ||
− | Because the on sockread triggers when we get | + | Because the on sockread triggers when we get our data, it is the most interesting part of our script. Many people find it easier to script and debug when they can see the entire page source code. The script below can be used to see everything the server sent us in a custom window (@ $+ sockname): |
<syntaxhighlight lang="mirc">;Print the entire server's reply to a custom window | <syntaxhighlight lang="mirc">;Print the entire server's reply to a custom window |
Revision as of 10:07, 20 January 2014
This article assumes that you have intermediate to advanced knowledge of the mIRC Scripting Language and familiarity with on events and custom aliases.
This tutorial is the TCP Sockets continuation of the Sockets (Intro) tutorial. If you haven't read that, please do so first before moving on to this one.
Now that you have some familiarity with the different types of sockets we can go into the scripting aspect of things. The most common task scripters want to perform is retrieving a piece data from some website.
Throughout this tutorial we will create two complete scripts, one which will go to our very own example page and a second one that will go to YouTube and get the title of the page and the view count.
Contents
Creating a Connection
Before we can do anything else we must first create a new connection to a specific address on a given port. The syntax to do this:
sockopen <handle> <address> <port>
A handle simply is a unique name by which we can refer to this exact socket.
Creating a secured Connection
I am sure you are very familiar with the padlock icon next to the URL in your browser. That icon indicated that website uses secure http (also known as HTTPS). The default port for HTTPS is 443. The /sockopen command can also be used to create secured SSL connections as well using the following syntax:
sockopen -e <handle> <address> <port>
IPv4 vs. IPv6 Sockets
The /sockopen command is directly influenced by the type of connection you have going on (I.e. /server -4/-6). When in standard IPv4 mode, /sockopen can only operate in IPv4 mode. It is not possible to make IPv6 sockets. When in IPv6, /sockopen will default to IPv6 addresses only. By going to the Option Dialog:
(Alt+O) -> Connect -> Options -> Ports... -> [X] Prioritize IPv6 over IPv4
Checking that checkbox will allow you to create IPv4 connections as well by telling mIRC to fall back to IPv4 if IPv6 failed.
Note: There is currently no convenient way to do this only using the /sockopen command.
Connection Example
Example 1
Since we want to socket to our silly demo page, http://www.zigwap.com/mirc/sockets_demo, our sockopen command will look something like this:
alias example1 { sockopen example1 www.zigwap.com 80 }
The above alias will create a socket by the name "example1". We can use that name to manipulate our socket late on. As a precaution, in order to not attempt to open an already opened socket, we will close it. If the socket is not open, mIRC will simply do nothing. In the advanced part of this tutorial we will explain how to handle this situation more gracefully by creating dynamic names which will give us the ability to create as many sockets as we need.
alias example1 { sockclose example1 sockopen example1 www.zigwap.com 80 }
Example 2 (YouTube)
In this example I thought we would do something different. Providing a YouTube link like http://www.youtube.com/watch?v=FDw0NdhK6QU and the script will return information on the video.
alias YouTube { if ($regex($1-, /\Qyoutube.com/watch?v=\E(\w+)$/)) { sockclose YouTube sockopen YouTube www.youtube.com 80 ; keep the video ID for later on sockmark YouTube $regml(1) } else { echo $color(info) -aef /YouTube: invalid youtube link halt } }
The Socket Mark
In the example above we introduced another command, the /sockmark command. The /sockmark command lets you store some text for that socket which can easily be retrieved using the $sock().mark identifier later on. This is a better alternative to using a global variables (or any other kind of global storage method) and having to clean it up later. The socket mark goes away automatically with the socket.
sockmark <handle> <value> ; The following will clear the mark: sockmark <handle>
The socket mark is restricted to the same line limit as the rest of mIRC (just under 4,150 bytes). A wildcard pattern can be used in the handle parameter to set the value of multiple sockets at once.
; Our socket mark value: $sock(<handle>).mark
Transmitting a Request After a Successful Connection
When a successful connection to the remote end-point has been established the on sockopen event will trigger. Inside the on sockopen event we must send our initial request which would depend on what our script wants to do. A typical script that utilizes the HTTP protocol must send its headers in this event.
Note: If a connection failed, on sockopen will also trigger, the difference this time is that $sockerr is set, see the Error Handling section for more informations.
The syntax for the on sockopen event is:
on *:sockopen:<handle>: { ;Your requests goes here }
As we said before, from within the sockopen event we must send our request to the remote end-point. To send data to the remote end-point through the socket we use the /sockwrite command. The sockwrite command has the following syntax:
sockwrite [-tn] <name> <text|%var|&binvar> ; You can limit the amount of data sent using the following syntax: sockwrite -b[tn] <name> <numbytes> <text|%var|&binvar>
By default, all space-delimited tokens that begin with the & symbol are treated as binary variables. The -t switch can be used to make the /sockwrite command treat it all as plain text instead.
The Sockwrite -n Switch and $crlf
Because the sockwrite command can be used to send any type of data you must be very explicit about the data you are sending. If you want to send multiple lines, you must append a $crlf to the end of your data. Alternatively you can also use the -n switch which will append a $crlf automatically for you.
Consider the following piece of code:
sockwrite $sockname AAAAA sockwrite $sockname BBBBB sockwrite $sockname CCCCC
Even though we have used three distinct sockwrite calls to send the data, the exact data we sent is:
AAAAABBBBBCCCCC
On the other hand, the following code:
sockwrite -n $sockname AAAAA sockwrite -n $sockname BBBBB sockwrite -n $sockname CCCCC /* Or: sockwrite $sockname AAAAA $+ $crlf sockwrite $sockname BBBBB $+ $crlf sockwrite $sockname CCCCC $+ $crlf */
Sent the following data:
AAAAA BBBBB CCCCC
Understanding this concept is important to understanding how to send data correctly via protocols like HTTP.
Sending Data Example
Example 1 (Continue)
Remember that the page we want to socket to is http://www.zigwap.com/mirc/sockets_demo. Our sockopen event will look something like this: (In this example I will be using version 1.0 of HTTP)
on *:sockopen:example1: { sockwrite -n example1 GET /mirc/sockets_demo HTTP/1.0 sockwrite -n example1 Host: www.zigwap.com sockwrite -n example1 }
Note: In HTTP, we must send a blank line at the end of our request to indicate that we are done with the header part, that's our 'sockwrite -n example1': remember -n appends a $crlf.
Example 2 (YouTube, Continue)
We will now add the sockopen part of our YouTube script. Recall that we stored the video ID in the socket mark? Well, we will now retrieve that ID using the $sock identifier and its mark property.
on *:sockopen:YouTube: { sockwrite -n YouTube GET /watch?v= $+ $sock($sockname).mark HTTP/1.1 sockwrite -n YouTube Host: www.youtube.com sockwrite -n YouTube }
URL Encoding
Some characters have special meanings when used in the URL. You might be familiar with URLs that look like this:
http://www.example.com/foo.php?request&name=value
If we want to send something that includes characters like the '=', '?' and '&' we must escape them before they can be safely used. The exact rules are specified by the RFC 1738 (Top of page 3).
We will use the following aliases to encode and decode URLs:
; Encodes URLs alias urlEncode return $regsubex($1, /(\W)/g, $+(%, $base($asc(\t), 10, 16, 2))) ; Decode encoded URLs alias urlDecode return $regsubex($replace($1, +, $chr(32)), /%([A-F\d]{2})/gi, $chr($base(\1, 16, 10)))
Consider the following example:
//echo -a $urlEncode(Hello & Goodbye?) //echo -a $urlDecode(Hello%20%26%20Goodbye%3F)
Will print:
Hello%20%26%20Goodbye%3F Hello & Goodbye?
Note the escaped characters. You should almost always encode all user input:
on *:SockOpen:example: { sockwrite -n example GET /foo/bar.php?foo= $+ $urlEncode(%input1) HTTP/1.1 sockwrite -n example Host: www.example.com sockwrite -n example $crlf }
POST vs GET?
By now you are probably asking yourself why did I use GET in our sockopen and how do you know what to use. In HTTP, there are two methods for sending data to the server: POST and GET. They only differ in the format we send that data. When requesting a normal page, you will most likely be using the GET method, when submitting a form; however, it might get a little tricky. When dealing with forms, by simply looking at the source code you can tell if it's a POST or a GET method:
<form id="FooBar" method="post" action=""> ... </form>
The most basic GET request will follow this basic syntax:
GET /folder/file.html HTTP/1.1 Host: www.example.com <blank line>
Let's take a look at the header a little closer:
GET /folder/file.html HTTP/1.1
This line is made up of three parts: method, path and version. The "GET", which SHOULD be always in uppercase letters, is the method. For more information about the POST method see the advanced part of this tutorial. The next part is the path, relative to the root folder of the website. If our webpage is www.example.com/pub/foo/bar.html, our path would be /pub/foo/bar.html. Lastly, the final part of this line is the HTTP version, for all practical reasons, you will probably using version 1.0. Sometimes we might need to use version 1.1 if we want features that are only available in that version.
Note: For all practical purposes the HTTP RFC states that casing should not matter. Unfortunately, I came across multiple web servers that only accepted it in the exact casing we present in here. It's best to follow that rule as well.
Next is the Host header:
Host: www.example.com
The Host header is required in HTTP version 1.1. Once again, although it should not cause any issues it best to use "Host:", not "host:" or "HOST:". If you forget to include this line, the server will most likely send you an error 400 (Bad Request) status code.
Reading Incoming Data
Once the server receives your request, it will send the response back to you. This will trigger the on sockread event. The basic syntax of the on sockread event is:
on *:sockread:<handle>: { ;Your code goes here }
The on sockread will most likely be the hardest and longest part of your code. When the on sockread triggers, you have to read the data and decide what to do with it. If your script just needs some information from that page you will have to find and parse the appropriate line.
When it comes to HTTP, the data you will receive from the server will contain a header followed by a blank line which will be followed by the content of the page. The content of the page will look identical to that text you find when you right click on a web page and click on view source code.
To read a single line from the socket, we use the /sockread command:
sockread <%var>
That sockread command actually reads up to a $crlf. This is important to know because many web pages don't end with a $crlf which means the last line won't be read. The -f switch can be used to force the sockread command to read the line even if it does not end with a $crlf.
Note: If the variable does not exist, a global variable gets created. It is therefore advised to declare a local variable beforehand.
When working with binary data or if the line is too long to be read into an ordinary variable, you can read the data into a binary variable using the following syntax:
sockread [numbytes] <&binvar>
There is a -n switch which can be used to read $crlf-terminated lines into the binary variable as well.
Note: the /sockread command allows you to read the data pretty much the way you want, you're not forced to read line by line.
Debugging
Because the on sockread triggers when we get our data, it is the most interesting part of our script. Many people find it easier to script and debug when they can see the entire page source code. The script below can be used to see everything the server sent us in a custom window (@ $+ sockname):
;Print the entire server's reply to a custom window on *:sockread:Example1: { window -deC @ $+ $sockname -1 -1 700 700 var %read sockread %read aline -p @ $+ $sockname : $+ %read }
Dealing with HTML code
One of the first things you will have to deal with when writing HTTP scripts is HTML code and lots of it. The single most common task is to simply get rid of some unwanted HTML tags that enclose your code. Below is a very small, yet extremely handy alias that will strip most HTML tags away:
alias noHTML return $regsubex($1, /<[^>]+(?:>|$)|^[^<>]+>/g, $null)
Consider this simple example:
//echo -a $noHTML(<strong>Example</strong> - <p>This is an <em>example</em></p>)
Will print the following result:
Example - This is an example
Keep this alias safe. Trust me, this tiny alias will become one of your most precious possessions.
Error Handling
Errors happen! It's a fact of life. It is your responsibility to check for them and gracefully handle them! The $sockerr identifier must be checked during every on sockread event and after every /sockread command. If the value of $sockerr is greater than zero, an error has occurred and we MUST stop whatever it is we were going to do with the socket, cleanup, perhaps display an error message and leave the on sockread event.
A basic example would look like this:
on *:sockread:example: { if ($sockerr) { echo $color(info) -sef Socket Error: $sock($sockname).wsmsg echo $color(info) -sef Socket Error Number: $sock($sockname).wserr Socket: $sockname } else { ;my code goes here... } }
Checking for an error gives you the opportunity to handle it in a sane way. Most scripts report that an error has occurred instead of simply stopping in their tracks.
Reading Data Example
Example 1 (Continue)
When I printed out the entire source the server sent us. The first part is the header, follows by a blank space, and follows by the actual page data. It should look something like this:
:HTTP/1.1 200 OK :Date: Sun, 11 Mar 2012 10:42:05 GMT :Server: Apache :X-Powered-By: PHP/5.2.17 :Connection: close :Content-Type: text/html : :<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> : <html xmlns="http://www.w3.org/1999/xhtml"> : <head> : <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> : <meta name="robots" content="noindex,follow" /> : <title>ZigWap - Demo Page</title> : </head> : <body> : <div align="center"> : <p>This is an example page!</p> : <p>This webpage is dedicated for the socket tutorial purpose. </p> : </div> : <p align="center">Your random color is: Pink</p> : </body> : </html>
The first part is the header, follows by a blank space, and follows by the actual page data. In this example we will be trying to retrieve the random color line. A simple if statement to check for *Your random colors is* should be sufficient enough.
on *:sockread:example1: { var %read sockread %read ; check if this is the line we want if (*Your random color is: * iswm %read) { ; break down our line into words tokenize 32 %read ; get the color and remove the html tab echo $color(info) -a Random Color: $noHTML($6) ; close the socket, it's not needed sockclose $sockname } }
Example 2 (YouTube, Continue)
If you tried to print the youtube page we did (http://www.youtube.com/watch?v=FDw0NdhK6QU) you will quickly realize how long the youtube webpage is. For this reason I will not show it here. The way we parse that page is very much like the one we did for the first example:
on *:sockread:YouTube: { var %x sockread %x if ($regex(%x, <meta name="title" content="(.+)">)) { ; parse the title set %title. $+ $sockName $regml(1) } else if (watch-view-count isin %x) { ; read the next line sockread %x ; make sure it's a number if ($regex(%x,/^ *([\d,]+)$/)) { ; parse the view count set %view. $+ $sockname $regml(1) } } else if (*Uploaded by* iswm %x) { ; print out the info echo -a Title: $($+(%, title., $sockname), 2) $& $noHTML(%x) Views: $($+(%, view., $sockname), 2) ; cleanup unset %*. $+ $sockname ; close the socket, no need to read anymore sockclose $sockname } }
Connection Terminated
It is possible for the remote end-point to terminate a connection, the same way you can /sockclose a connection early. When this happens the on sockclose event will trigger. The syntax for that event is:
on *:sockclose:<handle>: { ;Your code goes here }
Note: Only the remote end-port, not you, can trigger this event.