Feb-18-2017, 11:30 PM
(Feb-18-2017, 07:43 PM)bluefrog Wrote: I thought that a "map" iterates over the RDD and the function is applied to each element.Yes. More generally, map() applies a function to all elements of an iterable.
(Feb-18-2017, 07:43 PM)bluefrog Wrote: hence, I assumed the lambda replaces the functionIf the lambdas evaluated to the same thing as the function, then it could replace it. But it doesn't. Here's another refactoring
import traceback def reGrpLst(fw_c): fw,c = fw_c f,w = fw return (f,[(w,c)]) def reGrpLst_firstLambda(fw_c): return [(fw[0], (fw[1], fw_c[1])) for fw in fw_c[0]] def reGrpLst_secondLambda(fw_c): return [(f, (c, fw_c[1])) for f, c in fw_c[0]] INPUT_LIST = [((1, 2), 3)] functions = (reGrpLst, reGrpLst_firstLambda, reGrpLst_secondLambda) for f in functions: print "Trying", f try: print map(f, INPUT_LIST) except: traceback.print_exc() print
Output:Trying <function reGrpLst at 0x7fa960914578>
[(1, [(2, 3)])]
Trying <function reGrpLst_firstLambda at 0x7fa9609145f0>
Traceback (most recent call last):
File "testit.py", line 20, in <module>
print map(f, INPUT_LIST)
File "testit.py", line 9, in reGrpLst_firstLambda
return [(fw[0], (fw[1], fw_c[1])) for fw in fw_c[0]]
TypeError: 'int' object has no attribute '__getitem__'
Trying <function reGrpLst_secondLambda at 0x7fa960914668>
Traceback (most recent call last):
File "testit.py", line 20, in <module>
print map(f, INPUT_LIST)
File "testit.py", line 12, in reGrpLst_secondLambda
return [(f, (c, fw_c[1])) for f, c in fw_c[0]]
TypeError: 'int' object is not iterable
This has nothing to do with pyspark. I used Python's built-in map() here instead of the special one. Don't worry about pyspark until you've figured this out with regular Python, since pyspark is a complicating factor.A lambda is generally just like a regular function. Above, I turned your lambdas into full functions. Can you see that the full functions aren't all the same?
Comprehensions are an intermediate difficulty Python feature. They themselves are like syntactic sugar for maps. If you do a map with a lambda that does a comprehension, then it's liked nested mapping, or a nested loop. If you keep struggling with this, I highly recommend just not using a comprehension. They're not strictly necessary. Stick to simpler code, and tackle comprehensions again later on.