Python Forum
read complex file with both pandas and not - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/Forum-Python-Coding)
+--- Forum: Data Science (https://python-forum.io/Forum-Data-Science)
+--- Thread: read complex file with both pandas and not (/Thread-read-complex-file-with-both-pandas-and-not)



read complex file with both pandas and not - Diedro - Jan-29-2019

Dear all,

I have this complex file:

# ID ,116
# Localita  ,SB16
# Lon/Lat ,11.138574,46.886774
# Quota ,1839
DATA ORA, T, RH,PSFC,DIR,VEL10, PREC, RAD, CC,FOG
yyyy-mm-dd hh:mm, °C, %, hPa, °N, m/s, mm/h,W/m², %,-
2012-01-01 06:00, -0.1,100, 815,313, 2.6, 0.0, 0, 0,0
2012-01-01 07:00, -1.2, 93, 814,314, 4.8, 0.0, 0, 0,0
2012-01-01 08:00, 1.7, 68, 815,308, 7.5, 0.0, 41, 11,0
2012-01-01 09:00, 2.4, 65, 815,308, 7.4, 0.0, 150, 33,0
2012-01-01 10:00, 3.0, 64, 816,305, 8.4, 0.0, 170, 44,0
2012-01-01 11:00, 2.6, 65, 816,303, 6.3, 0.0, 321, 22,0
....
....

I would like to read the value 1839 and store it in a variable. After that, I would like to read all the data after the # with pandas and store it in a dataframe. However, I would like to use "DATA ORA, T, RH,PSFC,DIR,VEL10, PREC, RAD, CC,FOG" as header and not the six line.

I am able to do read it with Pandas but I have to skip the first row and cancel the 6-th line from the file.

What do you think? Is it better to move to simpler file.
Thanks in advance for any help,

Diedro


RE: read complex file with both pandas and not - Larz60+ - Jan-29-2019

This is quite simple, you only need pandas if you want to save in a different format or want a better looking report:
** Note ** Uses f-string and requires python 3.6 or newer
import csv
import os


#I need following to read file from proper directory
os.chdir(os.path.abspath(os.path.dirname(__file__)))

def read_data(filename, delimiter=','):
    # can overide delimiter if necessary
    headerfound = False
    with open(filename) as csvdata:
        reader = csv.reader(csvdata, delimiter=delimiter)
        for row in reader:
            if '#' in row[0]:
                if row[0] == '# ID ':
                    id = row[1]                # This is id
                    print(f'\nId = {id}')
                elif row[0] == '# Quota ':
                    quota = row[1]              # This is quota
                    print(f'Quota: {quota}')
                    headerfound = True
                continue
            elif headerfound:
                # this is header for pandas
                print('---------------------------------------------------------------------' \
                      '---------------------------------------------------------------------' \
                      '-----------')
                for item in row:
                    print(f'{item:16}', end = '')
                print('\n---------------------------------------------------------------------' \
                      '---------------------------------------------------------------------' \
                      '-----------')
                headerfound = False
            else:
                data = row              # each iteration here is a row to insert into pandas
                for item in row:
                    print(f'{item:16}', end = '')
                print()


if __name__ == '__main__':
    read_data('cvsdata.csv', )
output:
Output:
Id = 116 Quota: 1839 ----------------------------------------------------------------------------------------------------------------------------------------------------- DATA ORA T RH PSFC DIR VEL10 PREC RAD CC FOG ----------------------------------------------------------------------------------------------------------------------------------------------------- yyyy-mm-dd hh:mm °C % hPa °N m/s mm/h W/m² % - 2012-01-01 06:00 -0.1 100 815 313 2.6 0.0 0 0 0 2012-01-01 07:00 -1.2 93 814 314 4.8 0.0 0 0 0 2012-01-01 08:00 1.7 68 815 308 7.5 0.0 41 11 0 2012-01-01 09:00 2.4 65 815 308 7.4 0.0 150 33 0 2012-01-01 10:00 3.0 64 816 305 8.4 0.0 170 44 0 2012-01-01 11:00 2.6 65 816 303 6.3 0.0 321 22 0