Python Forum

Full Version: Error with Anaconda - collectToPython
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi guys!

I create a spark dataframe:

schema = StructType([
StructField("escolaridade", StringType(), False),
StructField("estado_civil", StringType(), False),
StructField("salario", DoubleType(), False),
StructField("total_acessos", IntegerType(), False)
])

df = spark.createDataFrame(pd_df, schema)


where pd_df is a pandas dataframe.

In the method bellow;

def v_col_escola(s):
if s == 'Basico':
return 0.0
elif s == 'Graduacao':
return 1.0
else:
return -1.0

rot = UserDefinedFunction(v_col_escola, DoubleType())
ldata = df.select(rot(col('escolaridade')).alias('escolaridade'), col('estado_civil')).where('escolaridade >= 0')


When I try read de new dataframe (ldata):

ldata.take(1)

Py4JJavaError Traceback (most recent call last)
<ipython-input-102-83ca7fdb585c> in <module>
----> 1 labeledData.take(1)

~\anaconda3\envs\curso_pandas\lib\site-packages\pyspark\sql\dataframe.py in take(self, num)
502 [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')]
503 """
--> 504 return self.limit(num).collect()
505
506 @since(1.3)

~\anaconda3\envs\curso_pandas\lib\site-packages\pyspark\sql\dataframe.py in collect(self)
464 """
465 with SCCallSiteSync(self._sc) as css:
--> 466 sock_info = self._jdf.collectToPython()
467 return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer())))


Can anyone help with this? I am using Jupyter in Anaconda 2.1.1...