Support Portal

for ProcessRobot and WinAutomation

Start a new topic

Regex Lookarounds tutorial

ADMIN

The standard practise for particular phrase searching at any part of a text, is by using regex lookarounds. 


There are two types of lookarounds:


  • Lookbehind, which is used to match a phrase that is preceded by a user specified text.
    • Positive lookbehind is syntaxed like (?<=a)something which can be used along with any regex parameter.
      The above phrase matches any "something" word that is preceded by an "a" word.

    • Negative Lookbehind is syntaxed like (?<!a)something  which is used to match a "something" word that is not preceded by an "a".

  • Lookahead, which is used to match a phrase that is followed by a user specified text.
    • Positive Lookahead is syntaxed like something(?=a) and  matches a "something" word that is being followed by an "a".

    • Negative Lookahead is syntaxed like something(?!a) and matches a "something" word that is being followed by an "a"


For example:


Given the below text we need to identify all the values corresponding to names

Provider: .NET Runtime
Level 2
Task 0
Keywords 0x80000000000000


by creating one regex phrase for each name we have:

  • (?<=Provider:\s).+(?=\n)
    Matches anything ".+" that is preceded by "Provider:(space character)" and followed by "(new line character)". We indicate special characters using the expression "\".

  • (?<=Level\s).+(?=\n)
    Matches anything ".+" that is preceded by "Level(space)" and followed by "(new line char)". This expression can also be written like (?<=Level\s)\d+(?=\n)
    where the expression "\d+" indicates one or more (+) decimal numbers.  

  • (?<=Task\s).+(?=\n)
    Also matches anything preceded by "Task(space)" and followed by "(new line char)".

  • (?<=Keywords\s).+(?=)
    Here we need to specify ".+" in order to get everything until the end of the string. We don't use (new line char) because there is non.
    The expression can also be written like: (?<=Keywords\s)\d+\x\d+(?=) which matches the phrase "\d+\x\d+" (one or more decimal numbers the 'x' character and after that one or more decimal numbers). However, this works only if we know the structure of the value.




        

A very useful regex tool in which we can test our expressions is the below:
https://www.regextester.com/97304


Login or Signup to post a comment