Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
For loops & DataFrames
#3
Thanks, maurom82. I generated the df inside the for loop and it worked. However, I read that df.append() copies all the data with every append and so, it makes the process inefficient when looping through files with many pages. My file has 130p and it took 46s to append the data frames generated in each loop which is fine but I'd like to ask whether there is a better/more efficient way of doing this? Any suggestions? Thanks!

Here's my code after moving the df generation inside the for loop:

df_all = pd.DataFrame()

for i in range(0,13):
    text = doc.getPage(i).extractText()
    #print(text)                            
     
    loc_re = re.compile(r'S\d+_\d+_DOG', re.IGNORECASE)
    loc = loc_re.findall(text)
    #print(cpt)
        
    easting_re = re.compile(r'E[ ]*\d{6}')
    easting = easting_re.findall(text)
    #print(easting)
     
    northing_re = re.compile(r'N[ ]*\d{7}')
    northing = northing_re.findall(text)
    #print(northing)

    df = pd.DataFrame({'LOC': loc, 'Easting':easting, 'Northing': northing})
    df_all = df_all.append(df, ignore_index=True)
print(df_all)
Reply


Messages In This Thread
For loops & DataFrames - by pprod - Feb-17-2021, 10:54 AM
RE: For loops & DataFrames - by maurom82 - Feb-18-2021, 02:51 PM
RE: For loops & DataFrames - by pprod - Feb-22-2021, 04:58 PM
RE: For loops & DataFrames - by nilamo - Feb-24-2021, 07:12 PM
RE: For loops & DataFrames - by nilamo - Feb-24-2021, 07:16 PM
RE: For loops & DataFrames - by pprod - Feb-25-2021, 08:23 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020