Regular expressions

Dealing with regular expressions is a very essential part of many text parsing operations. XSharper regex action is a wrapper around Regex .NET class, and is a block that executes once for every match, setting captures as variables prefixed with name attribute (if empty, and capture name is numeric, underscore is added ):

<set name="text" >The quick brown fox jumps over the lazy dog</set>

<!-- prints [The] and  [the] as default option is IgnoreCase -->
<regex pattern="(the)" value="${text}">
	<print>[${_1}]</print>
</regex>

By default search uses IgnoreCase option. But options can be changed:

<set name="text" >The quick brown fox jumps over the lazy dog</set>
<!-- prints only [the] -->
<regex pattern="(the)" value="${text}" options='none'>
	<print>[${_1}]</print>
</regex>

A piece of code may be executed if no matches were found:

<set name="text" >The quick brown fox jumps over the lazy dog</set>
<!-- prints No cats in sight! -->
<regex pattern="(cat)" value="${text}" options='none'>
	<print>[${_1}]</print>
	<nomatch>
		<print>No cats in sight!</print>
	</nomatch>
</regex>

Captures

Captures are available inside the regex block as variables. For captures without name, _1 variable is set for first capture, _2 for second, etc. _0 variable is set to the found string.

<set name="text" >The quick brown fox jumps over the lazy dog</set>
<regex pattern="brown (?'animal'\w+) (\w+)" value="${text}">
	<print>_0=[${_0}], _1=[${_1}], animal=[${animal}]</print>
</regex>
<print>After the block animal=${animal|'undefined'}</print>

Output is below, demonstrating that by default capture variables are not propagated beyond the regex body:

_0=[brown fox jumps]
_1=[jumps]
animal=[fox]
After the block animal=undefined

To get regex just to set capture variables, setCaptures may be set. Additional attributes define that the loop should exit after the first match is found, and that all capture variables should get 'x:' prefix:

<set name="text" >The quick brown fox jumps over the lazy dog</set>
<regex pattern="brown (?'animal'\w+) (\w+)" value="${text}" setCaptures='true' count='1' name='x:'/>
<print>x:0=[${x:_0}], x:1=[${x:_1}], x:animal=[${x:animal}]</print>

Output is now

x:0=[brown fox jumps], x:1=[jumps], x:animal=[fox]

Replacement

replace attribute can be used to replace text

<set name="text" >The quick brown fox jumps over the lazy dog</set>
<!-- prints The quick brown cat jumps over the lazy cat -->
<regex pattern="(fox|dog)" replace="cat" options='none' outTo="text1">${text}</regex>
<print>${text1}</print>

Replace with back-reference:

<set name="text" >The quick brown fox jumps over the lazy dog</set>
<!-- prints The quick brown cat (used to be 1) jumps over the lazy cat (used to be 1) -->
<regex pattern="(fox|dog)" 
		replace="cat (used to be ${1})" options='none' outTo="text1">${text}</regex>
<print>${text1}</print>

Note that the result is unexpected, because ${1} is expanded to '1' before being passed to regular expression. Can deal with this issue using a temp variable:

<set name="text" >The quick brown fox jumps over the lazy dog</set>
<!-- The quick brown cat (used to be fox) jumps over the lazy cat (used to be dog) -->
<set repl="cat (used to be ${1})" tr='none' />
<regex pattern="(fox|dog)" replace="${repl}" options='none' outTo="text1">${text}</regex>

or by using a different escape sequence:

<set name="text" >The quick brown fox jumps over the lazy dog</set>
<!-- The quick brown cat (used to be fox) jumps over the lazy cat (used to be dog) -->
<regex pattern="(fox|dog)" replace="cat (used to be ${1})" options='none' outTo="text1" tr='expandDual'>${{text}}</regex>

or by using an internal expression:

<set name="text" >The quick brown fox jumps over the lazy dog</set>
<!-- The quick brown cat (used to be fox) jumps over the lazy cat (used to be dog) -->
<regex pattern="(fox|dog)" replace="${='cat (used to be ${1})'}" options='none' outTo="text1" tr='expandDual'>${{text}}</regex>

or by just using C#-like syntax:

<set name="text" >The quick brown fox jumps over the lazy dog</set>
<print>${=Regex.Replace($text,'(dog|fox)','cat (used to be ${1})')}</print>