Support Portal

for ProcessRobot and WinAutomation

Start a new topic

Regular Expression - Extracting Street Addresses

Hey Guys,

So im despertely trying to figure out why regex isn't parsing out street addresses from a datatable. I have previously OCR PDFs and made a table inside of excel. Following that I've tried using the parse text option with regular expression to find the street addresses, lot number, block number, and unit numbers but the function does not seem to work.

Has anynone had any success with this?

Ive attached an excel file with the example output im looking for, any help would be appreciated.

(34.3 KB)

Best Answer

It's hard to know what your exact problem is, but I could make it work. Check the file.



It's hard to know what your exact problem is, but I could make it work. Check the file.


That worked perfectly, thank you! Upon ocr'ing more documents with tesseract it keeps crashing on document 15-18. I am changing up the pdfs order and removing the pdfs its crashing on which are all the same size approximately. 1-2 pages in size. Do you know why this would possibly happen?

I was thinking the heartbeat check every 5 seconds, ram, etc I am going as far as creating a new engine each time to see if its a virtual ram issue.

ocr crash.png
(93.4 KB)

You're welcome. 

Try to run eventvwr on the computer, and check for errors etc. before the process crash

1 person likes this

Faulting application name: WinAutomation.Process.exe, version:, time stamp: 0x5d47617c

Faulting module name: libtesseract400.dll, version:, time stamp: 0x5b7d4fc4

Exception code: 0xc0000005

Fault offset: 0x00000000000a342e

Faulting process id: 0x3ae0

Faulting application start time: 0x01d62d3500561dc4

Faulting application path: C:\Program Files\WinAutomation\WinAutomation.Process.exe

Faulting module path: C:\Program Files\WinAutomation\x64\libtesseract400.dll

Report Id: 007cb5be-1817-426b-93ef-3e063491f237

Faulting package full name:

Faulting package-relative application ID: 

Application: WinAutomation.Process.exe

Framework Version: v4.0.30319

Description: The process was terminated due to an unhandled exception.

Exception Info: System.AccessViolationException

   at InteropRuntimeImplementer.TessApiSignaturesInstance.TessApiSignaturesImplementation.BaseApiRecognize(System.Runtime.InteropServices.HandleRef, System.Runtime.InteropServices.HandleRef)

   at Tesseract.Page.Recognize()

   at Tesseract.Page.GetText()

   at WinAutomation.Actions.Runtime.TesseractOCREngineFacadeToVariant.GetText(System.Collections.Generic.List`1<System.String>)

   at WinAutomation.Actions.Runtime.OCRActions.ExtractTextWithOCRFromPDFImages(WinAutomation.Shared.Runtime.Variants.OCREngineVariant, System.Collections.Generic.List`1<System.String>)

   at WinAutomation.Actions.Runtime.PDFActions.ExtractTextFromPDFWithOCR2(WinAutomation.Shared.Runtime.Variants.Variant, WinAutomation.Shared.Runtime.Variants.Variant, WinAutomation.Shared.Runtime.Variants.Variant, WinAutomation.Shared.Runtime.Variants.Variant, WinAutomation.Shared.Runtime.Variants.Variant, WinAutomation.Shared.Runtime.Variants.Variant, WinAutomation.Shared.Runtime.Variants.Variant ByRef, Boolean, System.String, Int32, Int32)

   at WinAutomation.Actions.Runtime.ActualCompiledJob+<>c__DisplayClassa.<Execute>b__8(Boolean)

   at WinAutomation.Actions.Runtime.ActualCompiledJob.Execute()

   at WinAutomation.Robot.Runner.(Int32)

   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)

   at System.Threading.ThreadHelper.ThreadStart()


I just opened your WAJ file.

My guess is that you need to put your Create OCR action before the loop.

I've done it both ways, with the same result. No idea what to try next.

How else woud you suppose I fix this? Can I pay someone for assistance. Im in a bind and I dont expect anything for free. 

Thank you

+ System 

  - Provider 

   [ Name]  Application Error 
  - EventID 1000 

   [ Qualifiers]  0 
   Level 2 
   Task 100 
   Keywords 0x80000000000000 
  - TimeCreated 

   [ SystemTime]  2020-05-20T10:27:59.121655600Z 
   EventRecordID 20440 
   Channel Application 
   Computer DESKTOP-SL6BFC9 

- EventData 

   C:\Program Files\WinAutomation\WinAutomation.Process.exe 
   C:\Program Files\WinAutomation\x64\libtesseract400.dll 

Auto-cleanup of Processes with no lifesigns Additional Data: The following Processes got terminated due to not having given any lifesigns for a certain period of time:


JUST OCR ALL PDFS (instance id: 657ce95e-3109-4278-a24f-e0294597360c)



The following Processes got terminated due to not having given any lifesigns for a certain period of time:



Can anyone help me, can I pay Winautomation to fix this problem, I dont expect anything for free but im fish out of water for two days and its desperate times right now. Any help will be compensated for.


Does anyone know if this message about Softmotive merging with Microsft means we should seek support on the PowerAutomate support page?

My process crashes as well, on pdf number 23...

You need to email the Softomotive support to figure out what's happening with the ocr engine.

Login or Signup to post a comment