Skip to main content

Notifications

Power Automate - AI Builder
Unanswered

Extracting data from semi-structured pdf?

(0) ShareShare
ReportReport
Posted on by 4
Hi all, 
 
Wanted to see if anyone has run into this problem before. I have set of pdf's which are semi-structured (attached) and i'm hoping to extract data from the 'pesticide production information' section. The issue is that for each pdf, they can have varying number of pages, and varying number of sections per page (a new section starts when 42. is the first cell in the table). Is there any way to build an AI model that will extract the pesticide production information no matter the pdf format? Or will i have to train the model with a bunch of examples of varying types. Is it possible to utilize power automate to do this? Thanks in advance!
Categories:
  • Suggested answer
    takolota1 Profile Picture
    takolota1 4,762 on at
    Extracting data from semi-structured pdf?
    I recommend using OCR & GPT in this case, like the set-up in this template:
    You can request GPT to output an array with a JSON object for each of the dynamic number of sections found in your PDF, just like the Product Lines in the invoice example.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

November 2024 Newsletter…

November 2024 Community Newsletter…

Community Update Oct 28…

Power Platform Community Update…

Tuesday Tip #9 Get Recognized…

Welcome to a brand new series, Tuesday Tips…

Leaderboard

#1
WarrenBelz Profile Picture

WarrenBelz 143,867

#2
RandyHayes Profile Picture

RandyHayes 76,308

#3
Pstork1 Profile Picture

Pstork1 64,161

Leaderboard