(→$regsubex([name],,,)) |
(→$regsubex([name],,,)) |
||
Line 141: | Line 141: | ||
$regsubex($1,/(.)/g, <why is \ $+ t not cd : $1 $+ >) $+ @) | $regsubex($1,/(.)/g, <why is \ $+ t not cd : $1 $+ >) $+ @) | ||
− | mIRC adds the $+ if the | + | mIRC adds the $+ if the marker has text surrounding it. |
Now that inner $regsubex is evaluated, at this point, $1- is still what the outer $regsubex's tokenization produced, so before replacing the markers, you have: | Now that inner $regsubex is evaluated, at this point, $1- is still what the outer $regsubex's tokenization produced, so before replacing the markers, you have: |
Revision as of 21:29, 22 September 2014
Template:mIRC Guide Regular expressions can be used to perform complicated pattern matching operations. You should already know how to use Regular expressions as this page won't teach them.
Contents
[hide]Informations
mIRC uses the PCRE library to implement regex with the following options enabled:
- --enable-utf8
- --enable-unicode-properties
- --with-match-limit - around 1,000,000
- --with-match-limit-recursion - 999
mIRC has two custom modifier:
- S - strips control code from the input before matching (not supported by $hfind).
- g - perform a global matches: after one match has been found, mIRC tries to match again from the current position
mIRC remembers up to 50 regex matches. After 50 matches, the first match is overwritten.
$regex, $regsub and $regsubex can take an optional name as a parameter, to reference that call later, if you do not specify a name, mIRC use a default.
$regex([name],<input>,<regex>)
Perform a regular expression match, returns the number of matches found. Returns a negative value to indicate an error (-8 if you reach the maximum number of match allowed or -21 if you read the maximum number of recursion allowed)
mIRC remembers up to 32 captured text (backreference), you can use $regml([name],N) to returns the Nth backreference, or the total number of backreferences with N = 0
$regml() also has a .pos property, which returns the position in the input where this was captured.
//noop $regex(name,test,/[es]/g) | echo -a $regml(name,0) : $regml(name,1) -- $regml(name,2)
$regsub([name],<input>,<regex>,<subtext>,<%varname>)
Performs a regular expression match, like $regex(), and then performs a substitution using <subtext>.
Returns N, the number of substitutions made, and assigns the result to <%varname>.
//noop $regsub(name,test,/([es])/g,t) | echo -a $regml(name,0) : $regml(name,1) -- $regml(name,2)
$regsubex([name],<input>,<regex>,<subtext>)
$regsubex is a more modern version of $regsub, it performs the match, and then the substitution, returns the result of the substitution
This time, <subtext> is evaluated during substitution and can be an identifier.
<subtext> can also contain special markers:
- \0 - returns the number of matches
- \n - returns the current match number
- \t - returns the current match text (same as $regml(\n))
- \a - returns all match items
- \A - returns a non-spaced version of \a.
- \1 \2 \N ... - returns the Nth backreference for the current match
Notes on $regsubex:
The main steps when mIRC evaluates an identifier are:
- Processes [ ] (evaluating any variables/identifiers inside them once) and [[ ]] (turning them into [ ])
- Separates the identifier's parameters and evaluates each parameter (in left-to-right order).
- Passes the parameters to the identifier
It's a bit different in $regsubex, it has its own parsing routine. Indeed it needs not to evaluate the 'subtext' parameter before making the regex match, the steps are:
- Processes [ ] and [[ ]]
- Seperates parameters, evaluate the 'input' and the 'regex' parameters
- Performs the regex match
- * Tokenizes $1- according to the number of markers used in the 'subtext' parameters
- Replaces any markers used in the subtext by their corresponding $N identifiers
- Evaluates the 'subtext' parameter (one or more times, if /g is used)
- Performs the substitutions and returns the result.
* mIRC internally use $1- to store the values of the markers, this means you cannot use the previous tokenization of $1- in the subtext.
The way mIRC does this is pretty ugly, it checks how many markers you have and create a list of token ($1-).
Each token is assigned a value and mIRC then replaces the marker with the corresponding $N value.
Let's say your subtext is "\t \t \1 \n", mIRC assignes the matched text to $1, to $2, the first backreferences in the pattern is assigned to $3 and the Nth iteration to $4
If you use a form \N where N is a positive number greater or equal to 1 (like \1) and there is no such backreference number in the pattern, mIRC will fill that value (internally, using $1-) with the value of $regml(\n + N - 1):
$regsubex(abcdefgij,/([a-z])/g,<\6>)
\6 doesn't mean anything, there is no 6 backreferences made.
- When 'a' is matched \n is 1, only one marker used so $1 is filled with $regml(1 + 6 -1) = $regml(6) which is 'f'
- When 'b' is matched, \n is 2, $1 is filled with $regml(2 + 6 - 1) = $regml(7) which is 'g'
- And so on until \n + N - 1 is greater than the number of backref, at this point the characters are replaced with $null
Because of this, you cannot use the previous $N- value in the subtext.
Nested $regsubex calls are possible but let's remember the main steps:
- Processes [ ] and [[ ]]
- Seperates parameters, evaluate the 'input' and the 'regex' parameters
- Performs the regex match
- Tokenizes $1- according to the number of markers used in the 'subtext' parameters
- Replaces any markers used in the subtext by their corresponding $N identifiers
- Evaluates the 'subtext' parameter (one or more times, if /g is used)
- Performs the substitutions and returns the result.
When mIRC replaces the markers, it will do so on the whole subtext parameter:
$regsubex(abcdefcdab,/(cd)/g,\t : $regsubex(\t,/(.)/g,$upper(\t)) : \t)
The outer $regsubex will make the regex match, then it will replace \t everywhere in the subtext, the subtext of the outer $regsubex is:
$regsubex(\t,/(.)/g,$upper(\t))
Here all \t's gets the value of the matched text of the outer $regsubex, even the one inside $upper(), meaning that it won't work as expected. Indeed you want the \t inside the $upper to be the value of the matchted text of the inner $regsubex, not the outer one.
What we want to do is to get mIRC to see something different than "\t" when looking at the markers inside $upper in the subtext of the outer $regsubex.
If we were to use $regsubex(\t,/(.)/g,$upper( \ $+ t )) well you would just end up with calling $upper(\t) with plain text \t, because that $+ is going to be evaluated when $upper is evaluated. We want to interact after the outer $regsubex finished replacing markers but before $upper() is called.
The solution is to use the [[ \ $+ t ]] construct:
$regsubex(abcdefcdab,/(cd)/g,\t : $regsubex(\t,/(.)/g,$upper([[ \ $+ t ]])))
As we know $regsubex doesn't evaluate the subtext parameter but the processing of [ ] a,d [[ ]] is done for the whole line. So mIRC first change this line into:
$regsubex(abcdefcdab,/(cd)/g,\t : $regsubex(\t,/(.)/g,$upper( [ \ $+ t ] )))
Notice how only the [[ ]] changed, $+ was not evaluated because that subtext parameter is not evaluated, the [ ] processing happens before.
Now the outer $regsubex gets its parameters (mIRC will fail to see \t there, it will see \ $+ t, which is what we wanted), makes the regex match and call the subtext:
$regsubex(<value of \t in the outer $regsubex>,/(.)/g,$upper( [ \ $+ t ] ))
And as usual, [ ] is processed first and \ $+ t gives \t before this inner $regsubex start to replaces its own markers. bingo.
Note also that you cannot use a marker in the inner $regsubex subtext itself to get the value of that marker of the outer $regsubex context:
$regsubex(abcdefcdab,/(cd)/g,@\t : $regsubex(\t,/(.)/g, <why is \ $+ t not cd : \t>) $+ @)
This is because mIRC use the intermediate $1- value, when mirc replaces markers of the outer regsubex:
- 4 markers used in the subtext of the outer $regsubex
- $1 = $2 = $3 = $4 = \t = the matchtex text
The code becomes:
$regsubex($1,/(.)/g, <why is \ $+ t not cd : $1 $+ >) $+ @)
mIRC adds the $+ if the marker has text surrounding it.
Now that inner $regsubex is evaluated, at this point, $1- is still what the outer $regsubex's tokenization produced, so before replacing the markers, you have:
$regsubex(<value of $1>,/(.)/g, <why is \ $+ t not cd : $1 $+ >) $+ @)
The subtext is not evaluated as we saw, remember? So that $1 in the subtext is not evaluated, then we have the replacements of markers:
- 0 marker used
- $1 = $null
And since $1 is $null, well so is $1 in that inner $regsubex's subtext parameter.
/filter
/filter supports the -g switch to use a regular expression, you cannot get the backreference value using $regml() if you use a custom alias as the output (-k), you need to use a $regex call on that line.
$hfind
$hfind can be used with regex, it doesn't support the custom S modifier
/write, $read, $fline etc
They are various places in which regex can be used.