Feb-18-2017, 05:56 PM
The named function you're using doesn't have a comprehension in it. The lambdas you've written do. It seems here that you expect them to be equivalent, but they're not. This seems more like a regular Python issue than anything specific to pyspark. It would have helped if you had provided the error you got, but here's my showing that your lambdas have different behavior than your function
>>> def reGrpLst(fw_c): fw,c = fw_c f,w = fw return (f,[(w,c)]) >>> reGrpLst([[1, 2], 3]) (1, [(2, 3)]) >>> >>> (lambda fw_c: [ (fw[0], (fw[1] ,fw_c[1]) ) for fw in fw_c[0] ])([[1, 2], 3]) Traceback (most recent call last): File "<pyshell#4>", line 1, in <module> (lambda fw_c: [ (fw[0], (fw[1] ,fw_c[1]) ) for fw in fw_c[0] ])([[1, 2], 3]) File "<pyshell#4>", line 1, in <lambda> (lambda fw_c: [ (fw[0], (fw[1] ,fw_c[1]) ) for fw in fw_c[0] ])([[1, 2], 3]) TypeError: 'int' object has no attribute '__getitem__' >>> >>> (lambda fw_c: [ (f, (c ,fw_c[1]) ) for f,c in fw_c[0] ] )([[1, 2], 3]) Traceback (most recent call last): File "<pyshell#8>", line 1, in <module> (lambda fw_c: [ (f, (c ,fw_c[1]) ) for f,c in fw_c[0] ] )([[1, 2], 3]) File "<pyshell#8>", line 1, in <lambda> (lambda fw_c: [ (f, (c ,fw_c[1]) ) for f,c in fw_c[0] ] )([[1, 2], 3]) TypeError: 'int' object is not iterable