Feb-21-2019, 04:33 AM
Hi,
I am new to Python and need help with data cleaning.
The objective is to scrapped off tables from pdf file. That has been done with the tabula package and I have a CSV file.
In the original PDF file, the description can be long (up to 3 -4 lines) as shown in the picture below.
![[Image: PDF_table.jpg]](https://i5.photobucket.com/albums/y190/fongwee1/PDF_table.jpg)
After scrapping, this is what I get in my DataFrame.
![[Image: Data_frame.jpg]](https://i5.photobucket.com/albums/y190/fongwee1/Data_frame.jpg)
I need to combine the rows for the same description together.
Example: I need to combine index 4 and 5 together so that it would read as the following:
Index S/N Code Description Table
4 5 Description Change Breast, Lumps, Imaging Guided Vacuum Assisted Biopsy, Single lesion 2B
It should also delete Index 5 row after combing it together. Finally, I need to set a find and replace function to do it to the whole dataframe.
Please help.
Thanks
I am new to Python and need help with data cleaning.
The objective is to scrapped off tables from pdf file. That has been done with the tabula package and I have a CSV file.
In the original PDF file, the description can be long (up to 3 -4 lines) as shown in the picture below.
![[Image: PDF_table.jpg]](https://i5.photobucket.com/albums/y190/fongwee1/PDF_table.jpg)
After scrapping, this is what I get in my DataFrame.
![[Image: Data_frame.jpg]](https://i5.photobucket.com/albums/y190/fongwee1/Data_frame.jpg)
I need to combine the rows for the same description together.
Example: I need to combine index 4 and 5 together so that it would read as the following:
Index S/N Code Description Table
4 5 Description Change Breast, Lumps, Imaging Guided Vacuum Assisted Biopsy, Single lesion 2B
It should also delete Index 5 row after combing it together. Finally, I need to set a find and replace function to do it to the whole dataframe.
Please help.
Thanks