Thanks, maurom82. I generated the df inside the for loop and it worked. However, I read that df.append() copies all the data with every append and so, it makes the process inefficient when looping through files with many pages. My file has 130p and it took 46s to append the data frames generated in each loop which is fine but I'd like to ask whether there is a better/more efficient way of doing this? Any suggestions? Thanks!
Here's my code after moving the df generation inside the for loop:
Here's my code after moving the df generation inside the for loop:
df_all = pd.DataFrame() for i in range(0,13): text = doc.getPage(i).extractText() #print(text) loc_re = re.compile(r'S\d+_\d+_DOG', re.IGNORECASE) loc = loc_re.findall(text) #print(cpt) easting_re = re.compile(r'E[ ]*\d{6}') easting = easting_re.findall(text) #print(easting) northing_re = re.compile(r'N[ ]*\d{7}') northing = northing_re.findall(text) #print(northing) df = pd.DataFrame({'LOC': loc, 'Easting':easting, 'Northing': northing}) df_all = df_all.append(df, ignore_index=True) print(df_all)