Python Forum
assigning columns according to data range python
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
assigning columns according to data range python
#1
I am very new to this and trying to use it for some of my tasks. I have a list of raw data files without any extension, the files are having more than 100 columns. I successfully read all those files using following code and converted into one txt file.

import glob
path = '/Documents/Data/*'
read_files = glob.glob(path)
with open("result.txt", "wb") as outfile:
    for f in read_files:
        with open(f, "rb") as infile:
            outfile.write(infile.read())
My second task is to assign column names which I am unable to do as some of the data fields are merged with each other so I need to define the range of each column before assigning column names. I easily did this task in SAS using macros. The snippet is as follows:

%do i = 1 %to &nameM.;
Data myData ;
infile "&path.\&&flt2_&i"
missover dlm='09'x firstobs = 2 lrecl=4096 ;
input
NBR $ 1-40
DT $ 41-72
TYP $ 73-112
POST $ 113
CURR $ 114-116
DECI $ 117-156
ORIG $ 157-196
....
......
.....
run;
%end;
%mend;

Since, I am very new to Python (which sadly is the only challenge), any help will be much appreciated.

thanks
Reply
#2
you can use slicing - check https://python-forum.io/Thread-Basic-Str...nd-slicing
Reply
#3
thanks for sharing this, can you please guide using code example.
Reply
#4
no one here to help?
Reply
#5
Something like this (not tested as I don't have the files):
import glob
import csv
from collections import OrderedDict

# make an ordered dict for the fields
# for each field add
#(FIELD, (start, end))
# note that index starts at 0
FIELDS_MAP = OrderedDict([('NBR', (0, 40)),
                          ('DT', (40,72)),
                          ('TYP', (72, 112)),
                          ('POST', (112, 113)),
                          ('CURR', (113, 116)),
                          ('DECI', (116, 156)),
                          ('ORIG', (156, ))]) # last field has no end index


def parse_line(line):
    my_data = {}
    for key, value in FIELDS_MAP.items():
        if len(value) == 2:
            start, end = value
            my_data[key] = line[start:end] # slice from index start to index end-1
        else:
            start = value[0]
            my_data[key] = line[start:] # slice from index start to the end
    return my_data


path = '/Documents/Data/*'
read_files = glob.glob(path)
with open("result.txt", "wb") as outfile:
    wrtr = csv.DictWriter(outfile, fieldnames = FIELDS_MAP.keys())
    wrtr.writeheader()
    for f in read_files:
        with open(f, "rb") as infile:
            for line in infile:
                wrtr.writerow(parse_line(line))
I use csv module, but it's not mandatory
also OrderedDict from collections to make it easier to create map of fieldnames and indexex
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to check multiple columns value within range SamLiu 2 1,161 Mar-13-2023, 09:32 AM
Last Post: SamLiu
  Extracting Data into Columns using pdfplumber arvin 17 5,590 Dec-17-2022, 11:59 AM
Last Post: arvin
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 1,886 Dec-12-2022, 08:22 PM
Last Post: jh67
  How to keep columns header on excel without change after export data to excel file? ahmedbarbary 0 1,165 May-03-2022, 05:46 PM
Last Post: ahmedbarbary
  Assigning a new value to variable uriel 1 1,613 Dec-04-2021, 02:59 PM
Last Post: Underscore
  matplotlib x axis range goes over the set range Pedroski55 5 3,222 Nov-21-2021, 08:40 AM
Last Post: paul18fr
  Python Pandas: How do I average ONLY the data >1000 from several columns? JaneTan 0 1,480 Jul-17-2021, 01:34 PM
Last Post: JaneTan
  SaltStack: MySQL returner save less data into Database table columns xtc14 2 2,179 Jul-02-2021, 02:19 PM
Last Post: xtc14
  [Solved] Using readlines to read data file and sum columns Laplace12 4 3,563 Jun-16-2021, 12:46 PM
Last Post: Laplace12
  index of range, but data prints out mrc06405j 1 2,341 Mar-25-2021, 07:20 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020