May-16-2017, 11:53 AM
Good morning. I have just found this forum and I am really very glad.
I have a project to do which relates to unsupervised machine learning so I use python packets and tools such as numpy, scipy, pandas primarily.
However, for a long time I've stuck at a key point, and more specifically, how to access a very large file, without crashing my system.
Let me explain.
The initial csv file contains some data and read it with loadtxt as an 4.664.604 x 3 array.
I have thought various ways. The last one is to create two separate lists from the first and second column respectively (I am interestedn in these two columns), to eliminate the number of elements and to reconstruct the new array. But my system is crashing!
I read that dataframes from pandas that its easier the access but again its a very large file.
Is there any idea how can I access and compare the elements of an array of this size. I am very confused.
I don't know what else can I think and do.
Thankyou anyway
Angelika

I have a project to do which relates to unsupervised machine learning so I use python packets and tools such as numpy, scipy, pandas primarily.
However, for a long time I've stuck at a key point, and more specifically, how to access a very large file, without crashing my system.
Let me explain.
The initial csv file contains some data and read it with loadtxt as an 4.664.604 x 3 array.
B_init = np.loadtxt(open("out.munmun_twitterex_ut"), skiprows=1, usecols=(0,1,2)).astype(int);The next step has made me difficult a lot. I want to keep in a secont array only these rows which fullfill some conditions.
I have thought various ways. The last one is to create two separate lists from the first and second column respectively (I am interestedn in these two columns), to eliminate the number of elements and to reconstruct the new array. But my system is crashing!
I read that dataframes from pandas that its easier the access but again its a very large file.
#the second column is a list with the users [user_list.append(pairs[1]) for pairs in A] #create collections to count how many times each user is appeared (me th xrhsh twn collections wa datatypes metraw poses fores emfanizetai o kathe xrhsths) user_counter1 = Counter(user_list) #print user_counter1, "\n" #the first column is a list with the tags [tags_list.append(pairs[0]) for pairs in A] #print np.unique(tags_list) #create collections to count how many times each tag is appeared tags_counter1 = Counter(tags_list) #print tags_counter1,"\n" #keep the users with value > 80 it means each one has at least 80 tags(pairnw tous users pou exoun perissotera apo 80 Tags) [final_users1.append(key) for key,value in user_counter1.iteritems() if value > 80] #print len(final_users1) #create the desired table. The systes is crashng at this point! df = pd.DataFrame(index=np.unique(tags_list),columns=final_users1) for s in A: df.loc[s[0],s[1]] = 1 df = df.fillna(0) C = df.as_matrix().
Is there any idea how can I access and compare the elements of an array of this size. I am very confused.
I don't know what else can I think and do.
Thankyou anyway
Angelika


