Python Forum
Integration of apache spark and Kafka on eclipse pyspark
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Integration of apache spark and Kafka on eclipse pyspark
#1
Photo 
These are my development environments to integrate kafka and spark.

IDE : eclipse 2020-12
python : Anaconda 2020.02 (Python 3.7)
kafka : 2.13-2.7.0
spark : 3.0.1-bin-hadoop3.2

My eclipse configuration reference site is here. Simple codes of spark pyspark work successfully without errors. But integration of kafka and spark structured streaming brings the errors. These are the codes.

from pyspark.sql import SparkSession

spark = SparkSession.builder.master("local[*]").appName("appName").getOrCreate()
df = spark.read.format("kafka")\
            .option("kafka.bootstrap.servers", "localhost:9092")\
            .option("subscribe", "topicForMongoDB")\
            .option("startingOffsets", "earliest")\
            .load()\
            .selectExpr("CAST(value AS STRING) as column")
df.printSchema()
df.show()
The thrown Errors are

Error:
pyspark.sql.utils.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;
So I insert python codes which bind the related jar files.

import os

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.0,org.apache.spark:spark-streaming-kafka-0-10_2.12:3.1.0'
But this time another errors occurs.

Error:
Error: Missing application resource. Usage: spark-submit [options] <app jar | python file | R file> [app arguments] Usage: spark-submit --kill [submission ID] --master [spark://...] Usage: spark-submit --status [submission ID] --master [spark://...] Usage: spark-submit run-example [options] example-class [example args] Options: --master MASTER_URL spark://host:port, mesos://host:port, yarn, k8s://https://host:port, or local (Default: local[*]). --deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or on one of the worker machines inside the cluster ("cluster") (Default: client). --class CLASS_NAME Your application's main class (for Java / Scala apps). --name NAME A name of your application. --jars JARS Comma-separated list of jars to include on the driver and executor classpaths. --packages Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.
I am stuck here. My eclipse configuration and pyspark codes have some issues. But I have no idea what causes the errors. Kindly inform me of the integration configuration of kafka and spark pyspark. Any reply will be welcomed.
Reply


Messages In This Thread
Integration of apache spark and Kafka on eclipse pyspark - by aupres - Feb-27-2021, 06:53 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Active Directory integration dady 2 463 Oct-13-2023, 04:02 AM
Last Post: deanhystad
  PySpark Coding Challenge cpatte7372 4 5,959 Jun-25-2023, 12:56 PM
Last Post: prajwal_0078
  Pyspark dataframe siddhi1919 3 1,183 Apr-25-2023, 12:39 PM
Last Post: snippsat
  pyspark help lokesh 0 736 Jan-03-2023, 04:34 PM
Last Post: lokesh
  Help with Integration Pandas excel - Python Gegemendes 5 1,723 Jun-05-2022, 09:46 PM
Last Post: Gegemendes
  How to iterate Groupby in Python/PySpark DrData82 2 2,704 Feb-05-2022, 09:59 PM
Last Post: DrData82
  PySpark Equivalent Code cpatte7372 0 1,227 Jan-14-2022, 08:59 PM
Last Post: cpatte7372
  Pyspark - my code works but I want to make it better Kevin 1 1,745 Dec-01-2021, 05:04 AM
Last Post: Kevin
  pyspark parallel write operation not working aliyesami 1 1,651 Oct-16-2021, 05:18 PM
Last Post: aliyesami
  pyspark creating temp files in /tmp folder aliyesami 1 4,818 Oct-16-2021, 05:15 PM
Last Post: aliyesami

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020