AKY King of Life
2 Mar 2022
Python
I can't share the original scan pdf file. But there are columns but it is a non grid pdf file. My problem is I want to extract the data and segregate the data in different columns.
Rakshit
2 Mar 2022
I am assuming your table is not containing any images or logos.
You can do it using Tabula API easily I think, here are the steps:
Follow below code snippet,
import pandas as pd
import tabula
file = "your_scan_file_name.pdf"
path = <enter your directory path here\> + file
df = tabula.read_pdf(path, pages = '1', multiple_tables = True)
print(df)
If you want to extract particular tables you need coordinates of that table
for file in files:
path = path = '<enter your directory path here\>' + file
df = tabula.read_pdf(path, area=(234.019,38.991,313.638,555.396), pages=1)
print(df)
Hope this approach will help you for sure, let me know If it is not working for you!
© 2024 Copyrights reserved for web-brackets.com