homeworking using only numpy package

kirito85 · Dec-13-2018, 06:49 AM

Hi, I have difficulty doing my loop, will appreciate any help.

my isnumeric() is not working, i just need to show that isnumeric is false.

My loop is to count the total number of unique values in every column in a csv file.
Thanks.

import numpy as np

### Read the hdb resale price index csv file with the loadtxt() function
hdbrpi = "CA1data/housing-and-development-board-resale-price-index-1q2009-100-quarterly.csv"
data = np.genfromtxt(hdbrpi, delimiter=",", skip_header=1, dtype=[('quarter', 'U50'), ('index', 'U50')])

### Print out total rows and columns of data in the file
print("***HDB Resale Price Index***")
print()
print(f"There are {len(data)} rows and {len(data[0])} columns of data in this dataset {hdbrpi}")
print()

### Print out the names of the columns in the file
print("The names of the columns are:")

with open(hdbrpi) as data:
    data = np.genfromtxt(hdbrpi, delimiter=",", skip_header=1, dtype=[('quarter', 'U50'), ('index', 'U50')])
    line_count = 0
    for line_count in data:
        if line_count >= 0:
            print(row[line_count], type(row[line_count]) , "isnumeric:", row[line_count].isnumeric())
            #unique_elements, counts_elements = np.unique(data, return_counts=True)
            #print(unique_elements, counts_elements)
            #print(np.unique(row[line_count], return_counts = true))
        line_count += 1
        print(line_count)

**scidam** · Dec-18-2018, 07:00 AM

You can use pandas to load the doc and find the number of unique values, e.g.:

import pandas as pd
data = pd.read_csv('path_to_your_csv_file.csv')
uniques_by_column = {col: len(data.loc[:, col].unique()) for col in data.columns}
print(uniques_by_column) # this dictionary contains the number of unique values in each column

kirito85 · Dec-20-2018, 05:09 AM

Hi,

Thanks for the reply. I forgot to mention this but i can only use the numpy package for this homework. I am not allowed to use the panda package.

**scidam** · Dec-20-2018, 09:58 PM

The pandas package relies heavily on NumPy,
so, the solution of the problem will be almost the same:

import numpy as np
data = np.loadtxt('path_to_your_csv_file.csv', delimiter=',') # check delimiter
# data assumed to be 2D array
uniques_by_column = {j: len(np.unique(data[:, j])) for j in range(data.shape[-1])}

kirito85 · Dec-21-2018, 08:23 AM

Hi,

Thank you so much, i will use your code for my homework.

homeworking using only numpy package

User Panel Messages

Announcements