Support Portal

for ProcessRobot and WinAutomation

Start a new topic
Answered

Extract data from PDF Forms

I have a folder with filled-in PDF Forms, each of them looks like a multi-page text with tables and embedded PDF Forms fields (sometimes with drop-down lists of values).


I've tried the following with no result:

  • Extract Text from PDF - extracts everything except Forms fields
  • Extract Text from PDF with OCR (Tesseract) - unreadable result
  • Extract Text from PDF with OCR (MODI) - not working

Is there any way to extract this data from PDF Forms fields?


Best Answer

1. If the pdf is fillable pdf, you can use iText 7 for .NET https://github.com/itext/itext7-dotnet

2. If the pdf is pdf text, you can use built in action to read the text, but still you need to use regular expression to capture unstructured data.

3. Or you can use third party intelligent document processing application such as FlexiCapture.


Answer

1. If the pdf is fillable pdf, you can use iText 7 for .NET https://github.com/itext/itext7-dotnet

2. If the pdf is pdf text, you can use built in action to read the text, but still you need to use regular expression to capture unstructured data.

3. Or you can use third party intelligent document processing application such as FlexiCapture.

Thanks, Supratman,


How can I use itext7-dotnet from WinAutomation?

Login or Signup to post a comment