map without a function error

bluefrog · (This post was last modified: Feb-18-2017, 07:44 PM by bluefrog.)

thanks for the reply. I thought that a "map" iterates over the RDD and the function is applied to each element. So hence, I assumed the lambda replaces the function. I am new to both pyspark and lambdas, so perhaps I may need to do a bit more reading before I fully understand how to replace the function with a lambda.

the following output occurs with using map(reGrpLst):

[('file:/home/big_data/code/files/mansfield_park.txt', [23500, 17735, 9735, 16784, 14154, 16389, 12905, 27261, 7562, 17959]), ('file:/home/big_data/code/files/kjv.txt', [106189, 109173, 71421, 88498, 69612, 96175, 53168, 167502, 51475, 77898]), ('file:/home/big_data/code/files/hamlet.txt', [3941, 3298, 2460, 3922, 3227, 3581, 2671, 4767, 1974, 3211])]

whereas this is the error generated with the comprehensions:

Py4JJavaError                             Traceback (most recent call last)
<ipython-input-7-0c76eff16ea6> in <module>()
     46 f_wcL2_RDD = f_wcL_RDD.reduceByKey(add) #<<< create [(w,c), ... ,(w,c)] lists per file
     47 f_wVec_RDD = f_wcL2_RDD.map(lambda f_wc: (f_wc[0],hashing_vectorizer(f_wc[1],N)))
---> 48 print(f_wVec_RDD.top(3))

Apologies for not posting the error to begin with, but as you can see the error does not occur specifically on the map(lambda), but rather on the print statement that follows.

thanks for replying.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Error in using the output of one function in another function (beginner)	MadsPJ	6	5,084	Mar-13-2017, 03:06 PM Last Post: MadsPJ

map without a function error

User Panel Messages

Announcements