345x Filetype PDF File size 1.17 MB Source: www.indiumsoftware.com
SUCCESS STORY
Nested Tables & Machine Drawing
Text Extraction For An
Oil & Gas Company
DOMAIN TECHNOLOGIES
Oil & Gas Industry The solution was built leveraging Python and
several of its libraries.
KEY HIGHLIGHTS OCR:
Tesseract, Tesserocr, OCRmyPDF, PyTesseract
4x faster automated text Preprocessing and Post Processing Tools:
extraction using teX.ai. xPDF, Poppler, OpenCV, Pandas, Json
The need for human intervention
was reduced by over 80%. Table Detection and Extraction:
The quality of their process had Camelot, OpenCV, LSD (line segment detection),
increased by over 75%. csv, TensorFlow, FCN (Fully Convolutional
Networks), CNN (Convolutional Neural
Networks)
Application Deployment:
Flask, Docker
Nested Tables & Machine Drawingtext Extraction
For An Oil & Gas Company
Well Schematics
CUSTOMER BACKGROUND Identify and extract the nested tables as
The Client is one of the pioneers in the oil and gas separate entities. These documents had a
business, with a focus on innovation to find ways to combination of nested tables with complex
help their customers to fuel progress in agriculture, drilling equipment’s drawing.
industry, medicine, science, space, technology, and APPROACH & IMPLEMENTATION
transportation. The combination of engineering
disciplines, computer science, geophysics, and
metallurgy help create a winning formula for all teX.ai was leveraged to process text for all the 3 use
stakeholders in such projects. cases
BUSINESS REQUIREMENTS Quality File Validation
The Analysis table which contained the
Given the document intensive nature of business, chemical composition details was identified in
the client generally had to deal with numerous PDF the document and extracted using OCR.
documents dealing with complex drilling machine The time taken to extract is just a few seconds
parts diagrams and data in nested tables and and accuracy more than 85%.
various other formats. Their requirement was to Public Files (Surveys)
extract data and save in a format that could facilitate First isolated the survey tables using the
further analysis downstream. keyword search leveraging OCR.
CHALLENGES Survey details are then extracted using
techniques such as Tabula or Camelot.
Client had hundreds of PDF documents and each Well Schematics
of these PDF documents had pages ranging from All the nested tables were extracted as
2 to 100 pages. In some cases, the required data separate tables and saved in CSV format.
was not present in all of the pages of the PDF The nested tables are extracted in 2 stages
documents. leveraging FCN model at stage 1 and OpenCV
There were 5 different formats of documents in the next stage to detect rows in the table.
consisting of engineering drawings, nested tables, Deployment
un-demarcated tables, etc. This requires model Once the AI models were built and the required
creation for each of the document format. accuracy and performance tuning complete,
OBJECTIVE Indium deployed teX.ai with an admin interface
built using Flask and containerization using
Dockers.
To leverage teX.ai for the automated text extraction
process with an accuracy target of over 80% and
requiring less than 50% of the current time taken. BUSINESS IMPACT
SOLUTION OVERVIEW 4x faster text extraction from the source
docments, by leveraging teX.ai in the automated
Quality File Validation process flow.
Extraction of chemical composition file and The need for human intervention was reduced by
Converting it to a key-value pair. over 80%.
These chemical composition type PDF are 10 The quality of their process had increased by over
pages long. 75%.
Survey Files
Automatic identification of Survey(s) tables
from multi-page documents followed by
extraction.
© 2022 All Rights Reserved 2
About Indium
Indium is a Digital Engineering Services leader and Full Spectrum Integrator that helps
customers embrace and navigate the Cloud-native world with Certainty. With deep expertise
across Applications, Data & Analytics, AI, DevOps, Security and Digital Assurance we “Make
technology work” and accelerate business value, while adding scale and velocity to
customer’s digital journey on AWS.
Make Technology Work
USA INDIA UK ^/E'WKZ
ƵƉĞƌƟŶŽͮWƌŝŶĐĞƚŽŶ ŚĞŶŶĂŝͮĞŶŐĂůƵƌƵͮDƵŵďĂŝ >ŽŶĚŽŶ ^ŝŶŐĂƉŽƌĞ
dŽůůͲĨƌĞĞ͗нϭͲϴϴϴͲϮϬϳͲϱϵϲϵ dŽůůͲĨƌĞĞ͗ϭϴϬϬͲϭϮϯͲϭϭϵϭ WŚ͗нϰϰϭϰϮϬϯϬϬϬϭϰ WŚ͗нϲϱϲϴϭϮϳϴϴϴ
https://www.indiumsoftware.com
ǁǁǁ͘ŝŶĚŝƵŵƐŽŌǁĂƌĞ͘ĐŽŵ
&Žƌ^ĂůĞƐ/ŶƋƵŝƌŝĞƐ &Žƌ'ĞŶĞƌĂů/ŶƋƵŝƌŝĞƐ https://www.facebook.com/indiumsoftware/ https://twitter.com/indiumsoft?lang=en https://www.linkedin.com/company/indiumsoftware/?originalSubdomain=in
https://www.facebook.com/indiumsoftware/ https://twitter.com/indiumsoft?lang=en
https://www.facebook.com/indiumsoftware/ https://www.linkedin.com/company/indiumsoftware/?originalSubdomain=in
mailto:info@indiumsoftware.com https://twitter.com/indiumsoft?lang=en
ƐĂůĞƐΛŝŶĚŝƵŵƐŽŌǁĂƌĞ͘ĐŽŵ ŝŶĨŽΛŝŶĚŝƵŵƐŽŌǁĂƌĞ͘ĐŽŵ
mailto:sales@indiumsoftware.com
no reviews yet
Please Login to review.