Python Forum

I have space-separated data which I am trying to efficiently load into my program, however the only way that I know to split each row (which is in the form of a string) is to individually create each empty list and then add the 'split' items to them iteratively. I was hoping that there might be a more efficient way of doing this like there is for single lists (example_var = [f.split()[0] for f in example_file]). Below are the first few lines of the input document (don't need every column) and the relevant code segment:

# ID RADeg DecDeg Mag Col d Stat Vr EVr Band Pri sbV Sample Select SlitPA Len1 Len2 Vmag RA Dec
10001 275.945495610 -30.390691760 18.710 0.840 2.250 0 0.000 0.000 V 7990 21.777 1 0 275.0 1.2 1.3 18.710 18:23:46.919 -30:23:26.490
10002 275.934997560 -30.389574050 15.815 1.086 1.900 0 0.000 0.000 V 37559 21.311 1 0 275.0 1.2 1.3 15.815 18:23:44.399 -30:23:22.467
10003 275.931854250 -30.389314650 15.955 1.002 1.820 0 0.000 0.000 V 37122 21.197 1 0 275.0 1.2 1.3 15.955 18:23:43.645 -30:23:21.533
10004 275.929718020 -30.390192030 16.167 1.061 1.830 2 59.303 0.691 V 33445 21.211 1 0 275.0 1.2 1.3 16.167 18:23:43.132 -30:23:24.691

# Files and path manipulation
import os
# Joining paths
import os.path as path
# Useful argument parsing with optional arguments
import argparse

target_file = open(path.join(args.target_input_directory,'target_combined.dat'),'r')

target_id = []
target_ra_deg = []
target_dec_deg = []
target_mag = []
target_colour = []
target_d = []
target_stat = []
target_vel = []
target_vel_err = []
target_band = []

for line in target_file:
    # Make sure that the title line isn't being added
    if line.split()[0] != '#':
        target_id += [line.split()[0]]
        target_ra_deg += [line.split()[1]]
        target_dec_deg += [line.split()[2]]
        target_mag += [line.split()[3]]
        target_colour += [line.split()[4]]
        target_d += [line.split()[5]]
        target_stat += [line.split()[6]]
        target_vel += [line.split()[7]]
        target_vel_err += [line.split()[8]]
        target_band += [line.split()[9]]

Thank you for your help!

i think the variables(target_id etc) are the reason why the codes look longer. if the codes can auto generate the variables will look shorter IMO

variables = 9
nestedlist = [[]]*variables #you have 9 variables(target_id... etc) so 9 lists in nestedlist generated. 
for line in target_file:
    if line.split()[0] != '#': 
        splitted = line.split()
        for i in range(variables):
            nestedlist[i] = nestedlist[i] + [ splitted[i]]

for index,variable in enumerate(nestedlist):
 print index,variable

the results stored in nestedlist, each index in printed result will correspond to variable (target_id,target_ra_deg) in order.

You can try this

with open(path.join(args.target_input_directory,'target_combined.dat'),'r') as target_file:
    next(target_file) # <- skip header line
    columns = list(zip(*(line.split(10)[:10] for line in target_file)))

Curtnos

ka06059

Gribouillis