assigning columns according to data range python - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: assigning columns according to data range python (/thread-7678.html) |
assigning columns according to data range python - alamather - Jan-20-2018 I am very new to this and trying to use it for some of my tasks. I have a list of raw data files without any extension, the files are having more than 100 columns. I successfully read all those files using following code and converted into one txt file. import glob path = '/Documents/Data/*' read_files = glob.glob(path) with open("result.txt", "wb") as outfile: for f in read_files: with open(f, "rb") as infile: outfile.write(infile.read())My second task is to assign column names which I am unable to do as some of the data fields are merged with each other so I need to define the range of each column before assigning column names. I easily did this task in SAS using macros. The snippet is as follows: %do i = 1 %to &nameM.; Data myData ; infile "&path.\&&flt2_&i" missover dlm='09'x firstobs = 2 lrecl=4096 ; input NBR $ 1-40 DT $ 41-72 TYP $ 73-112 POST $ 113 CURR $ 114-116 DECI $ 117-156 ORIG $ 157-196 .... ...... ..... run; %end; %mend; Since, I am very new to Python (which sadly is the only challenge), any help will be much appreciated. thanks RE: assigning columns according to data range python - buran - Jan-20-2018 you can use slicing - check https://python-forum.io/Thread-Basic-Strings-index-and-slicing RE: assigning columns according to data range python - alamather - Jan-20-2018 thanks for sharing this, can you please guide using code example. RE: assigning columns according to data range python - alamather - Jan-21-2018 no one here to help? RE: assigning columns according to data range python - buran - Jan-21-2018 Something like this (not tested as I don't have the files): import glob import csv from collections import OrderedDict # make an ordered dict for the fields # for each field add #(FIELD, (start, end)) # note that index starts at 0 FIELDS_MAP = OrderedDict([('NBR', (0, 40)), ('DT', (40,72)), ('TYP', (72, 112)), ('POST', (112, 113)), ('CURR', (113, 116)), ('DECI', (116, 156)), ('ORIG', (156, ))]) # last field has no end index def parse_line(line): my_data = {} for key, value in FIELDS_MAP.items(): if len(value) == 2: start, end = value my_data[key] = line[start:end] # slice from index start to index end-1 else: start = value[0] my_data[key] = line[start:] # slice from index start to the end return my_data path = '/Documents/Data/*' read_files = glob.glob(path) with open("result.txt", "wb") as outfile: wrtr = csv.DictWriter(outfile, fieldnames = FIELDS_MAP.keys()) wrtr.writeheader() for f in read_files: with open(f, "rb") as infile: for line in infile: wrtr.writerow(parse_line(line))I use csv module, but it's not mandatory also OrderedDict from collections to make it easier to create map of fieldnames and indexex |