Aug-14-2023, 10:17 AM
Hello!
Belows are my dev environment,
And I try to integrate apache spark pyspark into VS code. So I set the python path into settings.json file of vs code.
The pyspark works without errors with these configuration. But the issue happens when I import pandas of python default module.
This simple and basic expression throws the errors like below,
Belows are my dev environment,
1 2 3 4 |
OS : Windows 11 python : Anaconda3 Apache Spark : 3.4 . 1 IDE : Visual Studio Code 1.18 . 1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
"python.defaultInterpreterPath" : "C:\\Anaconda3\\python.exe" , "python.condaPath" : "C:\\Anaconda3\\Scripts\\conda.exe" , "terminal.integrated.env.windows" : { "PYTHONPATH" : "C:/spark-3.4.1-bin-hadoop3/python;C:/spark-3.4.1-bin-hadoop3/python/pyspark;C:/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip;C:/spark-3.4.1-bin-hadoop3/python/lib/pyspark.zip" }, "python.autoComplete.extraPaths" : [ "C:\\spark-3.4.1-bin-hadoop3\\python" , "C:\\spark-3.4.1-bin-hadoop3\\python\\pyspark" , "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\py4j-0.10.9.7-src.zip" , "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\pyspark.zip" ], "python.analysis.extraPaths" : [ "C:\\spark-3.4.1-bin-hadoop3\\python" , "C:\\spark-3.4.1-bin-hadoop3\\python\\pyspark" , "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\py4j-0.10.9.7-src.zip" , "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\pyspark.zip" ] |
1 |
import pandas as pd |
Error:import pandas as pd
File "C:\spark-3.4.1-bin-hadoop3\python\pyspark\pandas\__init__.py", line 29, in <module>
from pyspark.pandas.missing.general_functions import MissingPandasLikeGeneralFunctions
File "C:\spark-3.4.1-bin-hadoop3\python\pyspark\pandas\__init__.py", line 34, in <module>
require_minimum_pandas_version()
File "C:\spark-3.4.1-bin-hadoop3\python\pyspark\sql\pandas\utils.py", line 37, in require_minimum_pandas_version
if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version):
^^^^^^^^^^^^^^^^^^
AttributeError: partially initialized module 'pandas' has no attribute '__version__' (most likely due to a circular import)
As you see, the imported pandas module is not python module, but pyspark.pandas module. So the code brings the error. I set the python default interpreter to anaconda3 at the top line of the settings.json file. But it still brings the errors. Any reply will be thanksful.