AKY King of Life
Published in : 2022-03-02
I can't share the original scan pdf file. But there are columns but it is a non grid pdf file. My problem is I want to extract the data and segregate the data in different columns.
I am assuming your table is not containing any images or logos.
You can do it using Tabula API easily I think, here are the steps:
Follow below code snippet,
import pandas as pd import tabula file = "your_scan_file_name.pdf" path = <enter your directory path here\> + file df = tabula.read_pdf(path, pages = '1', multiple_tables = True) print(df)
If you want to extract particular tables you need coordinates of that table
for file in files: path = path = '<enter your directory path here\>' + file df = tabula.read_pdf(path, area=(234.019,38.991,313.638,555.396), pages=1) print(df)
Hope this approach will help you for sure, let me know If it is not working for you!
Join our community and get the chance to solve your code issues & share your opinion with usSign up Now