Regex is cool! It is true! Regex allows you to find specific chunks of text in a known sentence, file, document etc… Regex is really useful to parse big log files, to identify a specific pattern, and to turn non structured data into structured data. Regex can also be used in powershell to validate user input for instance.
I use regex to do both. But prior to start the parsing, we first need to learn the syntax of regex. So let’s start!
Using regex can seem to be pretty difficult because of its ‘particular‘ syntax. And I agree, each time you are reading a regex, you are taking the risk of losing the sight of an eye due to its particular syntax. (Ever wonder how cyclops from the X-men got his cool glasses? I bet he spent a bit too much time reading regex at night in the dark!). Some things can really throw you off. But believe me, there is actually a very little to learn, to start to write really powerful and useful regexes.
I tried to cover the fundamentals, and the most use cases in 13 points to learn regex.
The 13 points to learn REGEX contain EVERYTHING you need to know about regex to get you started with the language.
On the road, I’ll try to pin point out the tricky parts, so that the learning process is as easy as possible for you.
Why should you even learn regex?
First, because I don’t like to have chunks of characters that I don’t understand the meaning of in my code.
Second, because the whole new possibilities that they give us concerning log parsing and user input validation.
Once you are comfortable with regex, head over to this article where I cover the major points on how to use efficiently regex in PowerShell.
Before we even start to look at the regex syntax, let’s have a look at what the basic structure of a regex is:
The two words above are stuck together on purpose. Using regex, you first have to use a selector then apply (or not) a quantifier to the selector to multiply it. The selector will actually select the character you are searching for, and the quantifier says how often that character must be repeated. Together they will match (or not) a specific portion of a string.
In the end, a long regex is nothing else then a selector with a quantifier repeated over and over and over again, as showcased in the example below:
It is also possible to create grouping around our selectors to address entities that below together as a whole. I cover groupings a bit lower in this article, but it makes sense to first learn the basics of the Regex syntax.
We saw here how to select unique characters, digits or white punctuation. But that can be a bit overkill to ‘only’ use the selectors above. Luckily, the regex syntax provides a way to address a whole group of characters together, like all the digits characters, or all the letters from the alphabet for example. These type of selects are called ‘character class selectors’.
The first thing we want to do is to select something using our regex selector. This could be a digit, an alphabetical character, any space character, or even any character. There is a way to select each of these, using specific syntax that I describe below:
The examples will be demonstrated using the two sentences below:
“our test sentence number 1.”
“our pest sentence number 2.”
(yeaaaaah, I know. ‘pest’ doesn’t mean anything in English, but I couldn’t come up with another idea ^^).
Please read this table through, line by line. Don’t skipt it !
Analyzing words or sentences using only character selectors can be pretty tedious. Therefore ‘character classes’ selectors also exists.
We have normal class selectors, but also the possibility to select their opposites using negated regex character class selectors. This is nothing to complicated, since the negated regex character class selectors are simply the UPPERCASE versions of the regular regex character class selectors.
The quantifier operators are used to specify how often the preceding character is repeated. Quantifier operators can also be applied to a group. (See grouping operators below).
Now that we know how to select our character(s) we have to tackle the second part of the regex, the quantifier. Remember, it’s the blue part of the regex as I showed above.