Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
For loops & DataFrames
#1
I'm running RegExs on a 13-page PDF file in a Jyputer notebook and I want to display the result in a DataFrame. However, when I execute the code below the DataFrame shows only the result for the last page of the PDF.

Is it possible to make the DataFrame show the RegExs results for all 13 pages keeping the code in different cells as below? (sorry, I can't share the PDF as it's confidential).


import PyPDF2
import re
import pandas as pd
#new cell
file = open(r'C:\file.pdf', 'rb')
doc = PyPDF2.PdfFileReader(file)
#new cell
for i in range(0,13):
    text = doc.getPage(i).extractText()
    #print(text)                            
    
    loc_re = re.compile(r'S\d+_\d+_DOG', re.IGNORECASE)
    loc = loc_re.findall(text)
    #print(cpt)
       
    easting_re = re.compile(r'E[ ]*\d{6}')
    easting = easting_re.findall(text)
    #print(easting)
    
    northing_re = re.compile(r'N[ ]*\d{7}')
    northing = northing_re.findall(text)
    #print(northing)
#new cell
df = {'LOC': loc, 'Easting':easting, 'Northing': northing}
df = pd.DataFrame(df)
df.head()
Reply


Messages In This Thread
For loops & DataFrames - by pprod - Feb-17-2021, 10:54 AM
RE: For loops & DataFrames - by maurom82 - Feb-18-2021, 02:51 PM
RE: For loops & DataFrames - by pprod - Feb-22-2021, 04:58 PM
RE: For loops & DataFrames - by nilamo - Feb-24-2021, 07:12 PM
RE: For loops & DataFrames - by nilamo - Feb-24-2021, 07:16 PM
RE: For loops & DataFrames - by pprod - Feb-25-2021, 08:23 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020