Python Forum
KafkaUtils module not found on spark 3 pyspark
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
KafkaUtils module not found on spark 3 pyspark
#1
I use hadoop 3.3.0 and spark 3.0.1-bin-hadoop3.2. And my python ide is eclipse version 2020-12. I try to develop python application with KafkaUtils pyspark module. My configuration reference of pyspark and eclipse is this site. Simple codes like below work well without exception.

from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("Kafka2RDD").setMaster("local[*]")
sc = SparkContext(conf = conf)
data = [1, 2, 3, 4, 5, 6]
distData = sc.parallelize(data)    

print(distData.count())
But I found the spark 3 pyspark module does not contain KafkaUtils at all. The below codes can not import KafkaUtils.

from pyspark.streaming.kafka import KafkaUtils 
from pyspark.streaming.kafka import OffsetRange
So, I downgrade spark from 3.0.1-bin-hadoop3.2 to 2.4.7-bin-hadoop2.7. Then I can sucsessfully import KafkaUtils on eclipse ide. But this time the exceptions related with spark version are thrown continuously.

Error:
Traceback (most recent call last): File "/home/jhwang/eclipse-workspace/BigData_Etl_Python/com/aaa/etl/kafka_spark_rdd.py", line 36, in <module> print(distData.count()) File "/usr/local/spark/python/pyspark/rdd.py", line 1055, in count return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/usr/local/spark/python/pyspark/rdd.py", line 1046, in sum return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add) File "/usr/local/spark/python/pyspark/rdd.py", line 917, in fold vals = self.mapPartitions(func).collect() File "/usr/local/spark/python/pyspark/rdd.py", line 816, in collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "/usr/python/anaconda3/lib/python3.7/site-packages/py4j/java_gateway.py", line 1305, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/python/anaconda3/lib/python3.7/site-packages/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : java.lang.IllegalArgumentException: Unsupported class file major version 55 at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166) at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:148) at org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:136)
How on earth can I import KafkaUtils and related modules on spark 3.0.1. Where is KafkaUtils module on pyspark of Spark 3.0.1 or how can the pySpark module can be installed? Any reply is welcome. Best regards.
Reply
#2
THis can be read on spark.apache.org concerning Kafka integration in Spark 3.0.1:
Quote:The Spark Streaming integration for Kafka 0.10 provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the new Kafka consumer API instead of the simple API, there are notable differences in usage.
and that's just the start of it...
Reply
#3
you can contact the authors here: [email protected]
aupres likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  pyside6 module not found ForeverNoob 4 1,275 Aug-18-2023, 04:36 PM
Last Post: snippsat
  can not import anaconda pandas module. PySpark pandas module is imported!! aupres 0 683 Aug-06-2023, 01:09 AM
Last Post: aupres
  PySpark Coding Challenge cpatte7372 4 5,964 Jun-25-2023, 12:56 PM
Last Post: prajwal_0078
  Pyspark dataframe siddhi1919 3 1,183 Apr-25-2023, 12:39 PM
Last Post: snippsat
  pyspark help lokesh 0 736 Jan-03-2023, 04:34 PM
Last Post: lokesh
  Module Not Found Error bitoded 4 1,346 Jan-01-2023, 09:08 AM
Last Post: bitoded
  pdfminer package: module isn't found Pavel_47 25 8,418 Sep-18-2022, 08:40 PM
Last Post: Larz60+
  Module not found question sighhhh12 0 1,448 Sep-09-2022, 05:43 AM
Last Post: sighhhh12
  [SOLVED] Tkinter module not found Milan 7 21,501 Aug-05-2022, 09:45 PM
Last Post: woooee
  No module found when I run a main.py tomtom 2 1,414 Jul-20-2022, 09:24 AM
Last Post: tomtom

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020