Python Forum
Read Data with multiprocessing - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Read Data with multiprocessing (/thread-10947.html)



Read Data with multiprocessing - RomanRettich - Jun-14-2018

Hey guys,

I am trying to get into 'multiprocessing' since a few days. I have a few hundred files which contain a few thousand numbers which I want to save in a matrix. I found a code in the internet that seems applicable for my problem, but I did not get into it by now (I am relatively new to python). Here is my changed version of the code (explanation further down)

import numpy as np
import multiprocessing
import math

def writedata(rs,out_q):
 data_sim=np.zeros((4281,80))
 for kkk in rs:
  rs_str=str(kkk)
  fnrs = '../OUTPUT_FILES' + '/Datei' + rs_str 
  data_sim_rec=np.genfromtxt(fnrs)
  data_sim_rec=np.array([data_sim_rec[:,1]]).T
  data_sim[:,kkk]=data_sim_rec[:,0]   
 out_q.put(data_sim)

rs=np.arange(1,80,1)
nprocs=10
out_q=multiprocessing.Queue()
chunksize=int(math.ceil(len(rs)/float(nprocs)))
procs=[]

for i in range(nprocs):
    p=multiprocessing.Process(target=writedata,args=(rs[chunksize*i:chunksize*(i+1)],out_q))
    procs.append(p)
    p.start()
resultdict={}

for i in range(nprocs):  
 resultdict.update((out_q.get()))

for p in procs:
    p.join()

data_sim=resultdict
I define a function called writedata which goes into a folder with exemplary 80 files, reads in the data in data_sim_rec and saves it in a Matrix data_sim in the appropriate row. This matrix shall be saved into a queue. I distribute the jobs to the processors by a list with numbers for the loop. Then, the data shall be saved in resultdict. The error message is:
 dictionary update sequence element #0 has length 80, 2 is required.
I know that I probably mix data types and that the 'update'-line is wrong as well, but I already tried many things which did not work.

I would be very happy about an answer!

Thanks in advance and best regards,

Max

sorry, here's the whole traceback

Traceback (most recent call last):
  File "FORUM.py", line 31, in <module>
    resultdict.update((out_q.get()))
ValueError: dictionary update sequence element #0 has length 80; 2 is required



RE: Read Data with multiprocessing - woooee - Jun-16-2018

Use a Manager list or dictionary to communicate between/to/from processes. A list would possibly work in this case with the function appending the url to a Manager list. The program would then check the list every second or so and print it, removing the items as they are printed if you want, or just reprinting all of the urls otherwise. See "Sharing State Between Processes" at https://www.cs.colorado.edu/~kena/classes/5828/s10/presentations/ali_alzabarah_se_presentati.pdf for an example. Post back if you have problems.