![]() |
Integration of apache spark and Kafka on eclipse pyspark - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Integration of apache spark and Kafka on eclipse pyspark (/thread-32701.html) |
Integration of apache spark and Kafka on eclipse pyspark - aupres - Feb-27-2021 These are my development environments to integrate kafka and spark. IDE : eclipse 2020-12 python : Anaconda 2020.02 (Python 3.7) kafka : 2.13-2.7.0 spark : 3.0.1-bin-hadoop3.2 My eclipse configuration reference site is here. Simple codes of spark pyspark work successfully without errors. But integration of kafka and spark structured streaming brings the errors. These are the codes. from pyspark.sql import SparkSession spark = SparkSession.builder.master("local[*]").appName("appName").getOrCreate() df = spark.read.format("kafka")\ .option("kafka.bootstrap.servers", "localhost:9092")\ .option("subscribe", "topicForMongoDB")\ .option("startingOffsets", "earliest")\ .load()\ .selectExpr("CAST(value AS STRING) as column") df.printSchema() df.show()The thrown Errors are So I insert python codes which bind the related jar files.import os os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.0,org.apache.spark:spark-streaming-kafka-0-10_2.12:3.1.0'But this time another errors occurs. I am stuck here. My eclipse configuration and pyspark codes have some issues. But I have no idea what causes the errors. Kindly inform me of the integration configuration of kafka and spark pyspark. Any reply will be welcomed.
RE: Integration of apache spark and Kafka on eclipse pyspark - Serafim - Feb-27-2021 Removed, no new info. |