Python Forum

Full Version: Errors if an xlsx file has blank rows in the beginning….
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Greetings!
This is my first try using Pandas.
The sniped below works file if the files(.XLSX) do not have blank lines in the beginning.
I thought by using 'dropna' I'll fix the problem but it does not:
exlf = exlf.dropna(axis = 0, how = 'all')
Snipped:
import pandas as pd

exlf = pd.read_excel('C:/01/XLS_Files/SHORT_2.xlsx')
exlf = exlf.dropna(axis = 0, how = 'all')
#print(f" Blank Row Removed.. {exlf}")
four_col = exlf.iloc[: , [0,4,9,17]].copy()    # <--- Filtering 4 Columns
#print('\n. Only 4(Four) Columns \n :',four_col) 

filtered_l = four_col[(four_col['VR Site'] =='ORVCC') & (four_col['VR Duration(Days)'] >=449)]
print(filtered_l)
Any help is appreciated.
What do you mean by "fix", and how is your code not fixing the problem?

One problem I can see is that you go from using column locations to using column names. If you know column names, why use iloc? If you don't know column names, why not?
four_col = exlf.iloc[: , [0,4,9,17]].copy()    # using location
#print('\n. Only 4(Four) Columns \n :',four_col) 
 
filtered_l = four_col[(four_col['VR Site'] =='ORVCC') & (four_col['VR Duration(Days)'] >=449)]  # Using column names
Is not knowing column names the problem, or is it that the dataframe does not have column names? (same problem as here?)

https://python-forum.io/thread-40610.html

Why are you using copy()? This creates a new DataFrame. No need to make a copy.
exlf.iloc[: , [0,4,9,17]]
As you can see here:
import pandas as pd
from random import choices

df = pd.DataFrame({c: choices(range(1, 101), k=5) for c in "ABCDEFGH"})
df2 = df.iloc[:, [1, 3, 5, 7]]
print(df2)
Output:
B D F H 0 10 32 21 49 1 60 82 8 25 2 32 62 64 96 3 98 93 100 66 4 13 73 21 74
And this shows selecting columns using column names.
import pandas as pd
from random import choices

df = pd.DataFrame({c: choices(range(1, 101), k=5) for c in "ABCDEFGH"})
df2 = df[["B", "D", "F", "H"]]
print(df2)
Output:
B D F H 0 71 27 84 100 1 45 39 96 43 2 53 20 75 86 3 95 82 16 83 4 15 8 56 29