Python Forum

Full Version: read complex file with both pandas and not
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Dear all,

I have this complex file:

# ID ,116
# Localita  ,SB16
# Lon/Lat ,11.138574,46.886774
# Quota ,1839
DATA ORA, T, RH,PSFC,DIR,VEL10, PREC, RAD, CC,FOG
yyyy-mm-dd hh:mm, °C, %, hPa, °N, m/s, mm/h,W/m², %,-
2012-01-01 06:00, -0.1,100, 815,313, 2.6, 0.0, 0, 0,0
2012-01-01 07:00, -1.2, 93, 814,314, 4.8, 0.0, 0, 0,0
2012-01-01 08:00, 1.7, 68, 815,308, 7.5, 0.0, 41, 11,0
2012-01-01 09:00, 2.4, 65, 815,308, 7.4, 0.0, 150, 33,0
2012-01-01 10:00, 3.0, 64, 816,305, 8.4, 0.0, 170, 44,0
2012-01-01 11:00, 2.6, 65, 816,303, 6.3, 0.0, 321, 22,0
....
....

I would like to read the value 1839 and store it in a variable. After that, I would like to read all the data after the # with pandas and store it in a dataframe. However, I would like to use "DATA ORA, T, RH,PSFC,DIR,VEL10, PREC, RAD, CC,FOG" as header and not the six line.

I am able to do read it with Pandas but I have to skip the first row and cancel the 6-th line from the file.

What do you think? Is it better to move to simpler file.
Thanks in advance for any help,

Diedro
This is quite simple, you only need pandas if you want to save in a different format or want a better looking report:
** Note ** Uses f-string and requires python 3.6 or newer
import csv
import os


#I need following to read file from proper directory
os.chdir(os.path.abspath(os.path.dirname(__file__)))

def read_data(filename, delimiter=','):
    # can overide delimiter if necessary
    headerfound = False
    with open(filename) as csvdata:
        reader = csv.reader(csvdata, delimiter=delimiter)
        for row in reader:
            if '#' in row[0]:
                if row[0] == '# ID ':
                    id = row[1]                # This is id
                    print(f'\nId = {id}')
                elif row[0] == '# Quota ':
                    quota = row[1]              # This is quota
                    print(f'Quota: {quota}')
                    headerfound = True
                continue
            elif headerfound:
                # this is header for pandas
                print('---------------------------------------------------------------------' \
                      '---------------------------------------------------------------------' \
                      '-----------')
                for item in row:
                    print(f'{item:16}', end = '')
                print('\n---------------------------------------------------------------------' \
                      '---------------------------------------------------------------------' \
                      '-----------')
                headerfound = False
            else:
                data = row              # each iteration here is a row to insert into pandas
                for item in row:
                    print(f'{item:16}', end = '')
                print()


if __name__ == '__main__':
    read_data('cvsdata.csv', )
output:
Output:
Id = 116 Quota: 1839 ----------------------------------------------------------------------------------------------------------------------------------------------------- DATA ORA T RH PSFC DIR VEL10 PREC RAD CC FOG ----------------------------------------------------------------------------------------------------------------------------------------------------- yyyy-mm-dd hh:mm °C % hPa °N m/s mm/h W/m² % - 2012-01-01 06:00 -0.1 100 815 313 2.6 0.0 0 0 0 2012-01-01 07:00 -1.2 93 814 314 4.8 0.0 0 0 0 2012-01-01 08:00 1.7 68 815 308 7.5 0.0 41 11 0 2012-01-01 09:00 2.4 65 815 308 7.4 0.0 150 33 0 2012-01-01 10:00 3.0 64 816 305 8.4 0.0 170 44 0 2012-01-01 11:00 2.6 65 816 303 6.3 0.0 321 22 0