numpy dtype anomaly - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: numpy dtype anomaly (/thread-13900.html) |
numpy dtype anomaly - bluefrog - Nov-05-2018 I'm attempting to load 2 arrays from 2 columns read from a file . The file is delimited and I'm using numpy's loadtxt() function to load the arrays, like so: #!/usr/bin/python3 import sys import numpy as np import os.path as op from datetime import datetime, date, time from io import StringIO sample_data = StringIO("AAPL,28-01-2011, ,344.17,344.4,333.53,336.1,21144800\n\ AAPL,31-01-2011, ,335.8,340.04,334.3,339.32,13473000\n\ AAPL,01-02-2011, ,341.3,345.65,340.98,345.03,15236800\n\ AAPL,02-02-2011, ,344.45,345.25,343.55,344.32,9242600\n\ AAPL,03-02-2011, ,343.8,344.24,338.55,343.44,14064100\n\ AAPL,04-02-2011, ,343.61,346.7,343.51,346.5,11494200") def usage(): print("usage: {} {}".format(op.basename(sys.argv[0], 'filename'))) def get_weekday(date_str): return datetime.strptime(date_str, "%d-%m-%Y").date().weekday() def load_arrays(data_file, *col_tuple): a1 = a2 = None rec_type = np.dtype([('stock_code', '|S4'), ('cob_date', '|S10'), ('filler', '|S1'), ('low_price', 'f4'), ('high_price', 'f4'), ('close_price', 'f4'), ('valuation', 'f4'), ('volume', 'uint') ]) try: a1, a2 = np.loadtxt(data_file, dtype=rec_type, usecols=col_tuple, delimiter=',', unpack=True) # a1, a2 = np.loadtxt(data_file, usecols=col_tuple, delimiter=',', unpack=True) except IOError as e: usage() # failed to open file except Exception as e: print(e) return a1, a2 try: # data_file = sys.argv[0] data_file = sample_data c, v = load_arrays(data_file, 5, 6) except IndexError: usage() print("Closing price array:\n{}".format(c)) print("\nValuation array:\n{}".format(v))When I attempt to load the arrays without any data types defined then the load is successfull, i.e. using a1, a2 = np.loadtxt(data_file, usecols=col_tuple, delimiter=',', unpack=True)but when I attempt to apply data types, by specifying a1, a2 = np.loadtxt(data_file, dtype=rec_type, usecols=col_tuple, delimiter=',', unpack=True)I get the following output list index out of range Closing price array: None Valuation array: NoneCan anybody suggest why the difference or what I am specifying incorrectly as part of the data type specification? RE: numpy dtype anomaly - stullis - Nov-06-2018 Hmm... If the list index is out of range, that suggests to me that your arrays have a different length than your rec_type. I would guess that rec_type has more fields than the arrays and the interpreter cannot find a corresponding index in your arrays to match up to the rec_type. RE: numpy dtype anomaly - bluefrog - Nov-07-2018 I've reduced the number of columns to 4, and still the error occurs if usecols is specified. If not, it succeeds in loading each column into an array. I've taken out the missing column, which is column 2 in the previous post. I've also included a decode('ascii') on the byte string for the date, which is the 2nd column. So without "usecols", it works fine, as follows: import sys import numpy as np import os.path as op from datetime import datetime, date, time from io import StringIO sample_data = StringIO( "AAPL,28-01-2011,344.17,344.4\n\ AAPL,31-01-2011,335.8,340.04\n\ AAPL,01-02-2011,341.3,345.65\n\ AAPL,02-02-2011,344.45,345.25\n\ AAPL,03-02-2011,343.8,344.24\n\ AAPL,04-02-2011,343.61,346.7") def usage(): print("usage: {} {}".format(op.basename(sys.argv[0], 'filename'))) def get_weekday(date_str): return datetime.strptime(date_str.decode('ascii'), "%d-%m-%Y").date().weekday() def load_arrays(data_file, *col_tuple): a1 = a2 = a3 = a4 = None # rec_type = np.dtype([('stock_code', 'S4'), ('cob_date', 'S10'), ('close_price', 'f4')]) try: a1, a2, a3, a4 = np.loadtxt(data_file, dtype={'names': ('stock_code','cob_date','high_price','low_price'), 'formats': ('S4', 'S10', 'f4', 'f4')}, converters={1: get_weekday}, delimiter=',', unpack=True) except IOError as e: usage() # failed to open file except Exception as e: print(e) return a1, a2, a3, a4 try: data_file = sample_data s, d, h, l = load_arrays(data_file) except IndexError: usage() print("Stock code array:\n{}".format(s)) print("\nClose of Business date array:\n{}".format(d)) print("\nHigh price array:\n{}".format(h)) print("\nLow price array:\n{}".format(l))But when I want to only load say column 0 and 2, then I get list index out of rangecode appears as: def load_arrays(data_file, *col_tuple): a1 = a2 = None try: a1, a2 = np.loadtxt(data_file, dtype={'names': ('stock_code','cob_date','high_price','low_price'), 'formats': ('S4', 'S10', 'f4', 'f4')}, converters={1: get_weekday}, delimiter=',', usecols=(0,2), unpack=True) except IOError as e: usage() # failed to open file except Exception as e: print(e) return a1, a2 try: data_file = sample_data s, h= load_arrays(data_file) except IndexError: usage()The number of columns matches what is specified in dtype. RE: numpy dtype anomaly - stullis - Nov-07-2018 According to the documentation, the usecols parameter does this: Quote:usecols : int or sequence, optional So, you're telling it to use column 0 and column 2 only but your dtype has four columns listed. Have you tried usecols with the same number of columns as the dtype? RE: numpy dtype anomaly - bluefrog - Nov-07-2018 thanks. it now works. so basically one has to specify your record column names and data types each time you extract an arbitrary set of column number(s). mmmm, bit clunky. |