Regex is cool! It is true! Regex allows you to find specific chunks of text in a known sentence, file, document etc… Regex is really useful to parse big log files, to identify a specific pattern, and to turn non structured data into structured data. Regex can also be used in powershell to validate user input for instance.
I use regex to do both. But prior to start the parsing, we first need to learn the syntax of regex. So let’s start!

Using regex can seem to be pretty difficult because of its ‘particular‘ syntax. And I agree, each time you are reading a regex, you are taking the risk of losing the sight of an eye due to its particular syntax. (Ever wonder how cyclops from the X-men got his cool glasses? I bet he spent a bit too much time reading regex at night in the dark!). Some things can really throw you off. But believe me, there is actually a very little to learn, to start to write really powerful and useful regexes.
I tried to cover the fundamentals, and the most use cases in 13 points to learn regex.
The 13 points to learn REGEX contain EVERYTHING you need to know about regex to get you started with the language.
On the road, I’ll try to pin point out the tricky parts, so that the learning process is as easy as possible for you.

Why should you even learn regex?

First, because I don’t like to have chunks of characters that I don’t understand the meaning of in my code.
Second, because the whole new possibilities that they give us concerning log parsing and user input validation.
And lastly, because Regex can be used in a lot different languages. So learning regex today definitely be a huge gain on the long term. The concepts you learn here today, will be applicable to many different programming languages (Powershell, C#, php, Javascript etc…), so you are gaining on all levels by learning regex.

Once you are comfortable with regex, head over to this article where I cover the major points on how to use efficiently regex in PowerShell.

Regex format

Before we even start to look at the regex syntax, let’s have a look at what the basic structure of a regex is:

SelectorQuantifier

The two words above are stuck together on purpose. Using regex, you first have to use a selector then apply (or not) a quantifier to the selector to multiply it. The selector will actually select the character you are searching for, and the quantifier says how often that character must be repeated. Together they will match (or not) a specific portion of a string.
In the end, a long regex is nothing else then a selector with a quantifier repeated over and over and over again, as showcased in the example below:

SelectorQuantifierSelectorQuantifierSelectorQuantifier

It is also possible to create grouping around our selectors to address entities that below together as a whole. I cover groupings a bit lower in this article, but it makes sense to first learn the basics of the Regex syntax.

We saw here how to select unique characters, digits or white punctuation. But that can be a bit overkill to ‘only’ use the selectors above. Luckily, the regex syntax provides a way to address a whole group of characters together, like all the digits characters, or all the letters from the alphabet for example. These type of selects are called ‘character class selectors’.

Regex character selectors

The first thing we want to do is to select something using our regex selector. This could be a digit, an alphabetical character, any space character, or even any character. There is a way to select each of these, using specific syntax that I describe below:

The examples will be demonstrated using the two sentences below:

“our test sentence number 1.”

“our pest sentence number 2.”

(yeaaaaah, I know. ‘pest’ doesn’t mean anything in English, but I couldn’t come up with another idea ^^).

Please read this table through, line by line. Don’t skipt it !

Step Operator Meaning Example
1

.

The dot operator, will select any character: Digit, word or white space character. ‘Our .est’

Would return “our test” but also “our pest”

2

w

 select a word character – which can be: a-z or A-Z and white spaces. Would match “”  Will match the first word character, which is “o
3

d

 select a digit. Will match 1, in the number 123 (See repetition operators).  Selects 1 and 2
4

s

 white space selector (spaces, tabs and new lines). “This has a space” Would match each.  Will match the first empty space between ‘our’ and ‘test’ or ‘pest’.
5

^

 the character preceding the hat, must be located at the start of the sentence.  will match the first letter of our sentence.
“^test” will not match any of the two sentences since test doesn’t appears at the beginning of the line
6

[]

 used to select a specific character: w[oa]p would match any of the strings ‘wop’ and ‘wap’.  “our [tpw]est” will match a case were the specified character is either a t, p or w. in our case, it will match:
“our test”, “our pest” but could also potentially match “our west”
7

[^]

 Using the hat in a between brackets implied that we want to select not any of the characters located in between the brackets.
-> [^er] represents any character that is not e  or r.
Watch out: Trap! The hat operator is also used to select characters at the beginning of a sentence. This can be the source of a lot of confusion.
our [^p]est

Will specifically not matchour pest“.

This will then only match “our test”

8

|

 The pipe operator is a logical or.  test|pest
Will match either ‘test’ or ‘pest’
9

abcdefghijklmnop
qrstuvwxyz

 You can also match characters or complete sentences directly ” our test sentence number 1.”

Will match the corresponding sentence. (white spaces and character case are important).

Character class selectors

Analyzing words or sentences using only character selectors can be pretty tedious. Therefore ‘character classes’ selectors also exists.

Step Operator Meaning Example
1.        w selects any word character, including digits and underscores, excluding white spaces. w -> will match o then u, then r, the empty space then ‘t’ or ‘p’ etc…
2.        d Selects any digit character. Any number going from 0-9. d -> will match either 1 or 2 depending on the sentence it is analsing.
3.        s selects any white space characters: including tabs, white spaces, and new lines. Select the first white space between ‘our’ and ‘test’ or ‘pest’.

Negated character class selectors

We have normal class selectors, but also the possibility to select their opposites using negated regex character class selectors. This is nothing to complicated, since the negated regex character class selectors are simply the UPPERCASE versions of the regular regex character class selectors.

Step Operator Meaning Example
1.        W the preceding character must be any none word character.
2.        D the preceding character must be any none digit character.
3.        S the preceding character must be any none white space character.

Regex quantifiers

The quantifier operators are used to specify how often the preceding character is repeated. Quantifier operators can also be applied to a group. (See grouping operators below).

Step Operator Meaning Example
1.        ? means the preceding character is optional (it can be found between zero or one time). Ab? Will match ‘a’ and ‘ab’
2. + means the preceding character must be found between 1 and unlimited times. Ab+ will match ‘ab’, ‘abb’, ‘abbb’ but not ‘a’
3. * means the preceding character must be repeated between zero to infinite number of times. Ab* will match ‘a’, ‘ab’, ‘abb’,’abbb’ ,’abbbbbb’ etc..
4. {min,max} will repeat the preceding character selector at least ‘min‘ and maximum ‘max.

There is a slight variation. If you omit either the min or the max, then you are actually saying ‘infinte’.( See examples)

d{1,3} -> will match any number between 1 to 3 digits

a{2,4} -> wil match ‘aaa’ and ‘aaaa’ but not ‘a’ or ‘aaaaa’

{2,} -> means at least twice, with a max of inifinte

{,6} -> means at least 1,2,3,4,5 times with a max of 6.

Now that we know how to select our character(s) we have to tackle the second part of the regex, the quantifier. Remember, it’s the blue part of the regex as I showed above.

Grouping Selectors

This is the 20th step. It Is possible to find group of characters using the parenthesis operators. Grouping characters together allows to treat an entire block together, and for instance, use quantifier operators for an entire group. The group itself can contain selectors, and quantifier operators. See the theoretical example below.

SelectorQuantifier(SelectorQuantifier)Quantifier

Inside a grouping operator, you can set as many selectors and quantifiers as you want. (The absolute minimum would be at least one selector) It allows you to treat the block as a whole.

(power | force)?shell -> will match either ‘powershell‘ or ‘forceshell‘.

And that is all you need to know about grouping to get you started

Summary

To recap: in order to create a regex, keep in mind its structure:

SelectorQuantifier

that is actually ALL you need to know (with it’s special syntax of course).

I will discuss in a next post the different ways how regexes can be used in PowerShell with parsing NETSH or WindowsUpdate.log for example, so stay tuned.

#Stephane