Python Forum
assigning columns according to data range python - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: assigning columns according to data range python (/thread-7678.html)



assigning columns according to data range python - alamather - Jan-20-2018

I am very new to this and trying to use it for some of my tasks. I have a list of raw data files without any extension, the files are having more than 100 columns. I successfully read all those files using following code and converted into one txt file.

import glob
path = '/Documents/Data/*'
read_files = glob.glob(path)
with open("result.txt", "wb") as outfile:
    for f in read_files:
        with open(f, "rb") as infile:
            outfile.write(infile.read())
My second task is to assign column names which I am unable to do as some of the data fields are merged with each other so I need to define the range of each column before assigning column names. I easily did this task in SAS using macros. The snippet is as follows:

%do i = 1 %to &nameM.;
Data myData ;
infile "&path.\&&flt2_&i"
missover dlm='09'x firstobs = 2 lrecl=4096 ;
input
NBR $ 1-40
DT $ 41-72
TYP $ 73-112
POST $ 113
CURR $ 114-116
DECI $ 117-156
ORIG $ 157-196
....
......
.....
run;
%end;
%mend;

Since, I am very new to Python (which sadly is the only challenge), any help will be much appreciated.

thanks


RE: assigning columns according to data range python - buran - Jan-20-2018

you can use slicing - check https://python-forum.io/Thread-Basic-Strings-index-and-slicing


RE: assigning columns according to data range python - alamather - Jan-20-2018

thanks for sharing this, can you please guide using code example.


RE: assigning columns according to data range python - alamather - Jan-21-2018

no one here to help?


RE: assigning columns according to data range python - buran - Jan-21-2018

Something like this (not tested as I don't have the files):
import glob
import csv
from collections import OrderedDict

# make an ordered dict for the fields
# for each field add
#(FIELD, (start, end))
# note that index starts at 0
FIELDS_MAP = OrderedDict([('NBR', (0, 40)),
                          ('DT', (40,72)),
                          ('TYP', (72, 112)),
                          ('POST', (112, 113)),
                          ('CURR', (113, 116)),
                          ('DECI', (116, 156)),
                          ('ORIG', (156, ))]) # last field has no end index


def parse_line(line):
    my_data = {}
    for key, value in FIELDS_MAP.items():
        if len(value) == 2:
            start, end = value
            my_data[key] = line[start:end] # slice from index start to index end-1
        else:
            start = value[0]
            my_data[key] = line[start:] # slice from index start to the end
    return my_data


path = '/Documents/Data/*'
read_files = glob.glob(path)
with open("result.txt", "wb") as outfile:
    wrtr = csv.DictWriter(outfile, fieldnames = FIELDS_MAP.keys())
    wrtr.writeheader()
    for f in read_files:
        with open(f, "rb") as infile:
            for line in infile:
                wrtr.writerow(parse_line(line))
I use csv module, but it's not mandatory
also OrderedDict from collections to make it easier to create map of fieldnames and indexex