Python Forum
How to set PYTHONPATH in Visual Studio Code?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to set PYTHONPATH in Visual Studio Code?
#1
Hello!
Belows are my dev environment,

OS : Windows 11
python : Anaconda3 
Apache Spark : 3.4.1
IDE : Visual Studio Code 1.18.1
And I try to integrate apache spark pyspark into VS code. So I set the python path into settings.json file of vs code.


"python.defaultInterpreterPath": "C:\\Anaconda3\\python.exe",
"python.condaPath": "C:\\Anaconda3\\Scripts\\conda.exe",
"terminal.integrated.env.windows": {
	"PYTHONPATH": "C:/spark-3.4.1-bin-hadoop3/python;C:/spark-3.4.1-bin-hadoop3/python/pyspark;C:/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip;C:/spark-3.4.1-bin-hadoop3/python/lib/pyspark.zip"
},
"python.autoComplete.extraPaths": [
        "C:\\spark-3.4.1-bin-hadoop3\\python",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\pyspark",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\py4j-0.10.9.7-src.zip",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\pyspark.zip"
],
"python.analysis.extraPaths": [
        "C:\\spark-3.4.1-bin-hadoop3\\python",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\pyspark",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\py4j-0.10.9.7-src.zip",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\pyspark.zip"
]
The pyspark works without errors with these configuration. But the issue happens when I import pandas of python default module.

import pandas as pd
This simple and basic expression throws the errors like below,

Error:
import pandas as pd File "C:\spark-3.4.1-bin-hadoop3\python\pyspark\pandas\__init__.py", line 29, in <module> from pyspark.pandas.missing.general_functions import MissingPandasLikeGeneralFunctions File "C:\spark-3.4.1-bin-hadoop3\python\pyspark\pandas\__init__.py", line 34, in <module> require_minimum_pandas_version() File "C:\spark-3.4.1-bin-hadoop3\python\pyspark\sql\pandas\utils.py", line 37, in require_minimum_pandas_version if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version): ^^^^^^^^^^^^^^^^^^ AttributeError: partially initialized module 'pandas' has no attribute '__version__' (most likely due to a circular import)
As you see, the imported pandas module is not python module, but pyspark.pandas module. So the code brings the error. I set the python default interpreter to anaconda3 at the top line of the settings.json file. But it still brings the errors. Any reply will be thanksful.
Reply
#2
Do not mess with Path setting in VS Code,click down in right corner and choice right environment(Python interpreter).
See that a conda environment look like this 3.10.1(tom_env: conda).
[Image: b08f5m.png]
So this will work if eg start a blank eg VSCodium,with no settings.json setup.
Also from command line always activate a environment using Anaconda or MiniConda,eg (base) or should make your own.
Here i activate my own (tom_env) and start VSCode and VSCodium.
[Image: u4BrpA.png]
Reply
#3
Thanks for your reply. In my vs code, I have only one python interpreter - conda base. And I always activate the conda. But the same error is thrown.
   
Reply
#4
Delete all you have shown in first in settings.json .
in your error messgage it load Pandas from:
Error:
C:\spark-3.4.1-bin-hadoop3\python\pyspark\pandas\__init__.py
This is wrong for you with (base) Anaconda it should load from:
C:\Anaconda\Lib\site-packages\pandas\... 
If i test eg with a settings.json with no config setup at all.
{	
	"workbench.iconTheme": "simple-icons",
	"workbench.productIconTheme": "fluent-icons",
	"window.zoomLevel": 1,
	"workbench.colorTheme": "Visual Studio Dark",
}
Test code
import sys
import pandas as pd

print(sys.executable)
print(pd.__file__)
So running code over eg in 3.10.1(tom_env: conda).
Output:
G:\miniconda3\envs\tom_env\python.exe G:\miniconda3\envs\tom_env\lib\site-packages\pandas\__init__.py
See that it ponit to same root folders.
If change to eg Python 3.11.3 from python.org.
Output:
C:\Python311\python.exe C:\Python311\Lib\site-packages\pandas\__init__.py
See the same pattern,ovet to eg 3.10.8(sci_env: conda).
Output:
G:\miniconda3\envs\sci_env\python.exe G:\miniconda3\envs\sci_env\lib\site-packages\pandas\__init__.py
Pandas do of course need to be installed in all envioment,there is no sharing or loading from other Paths,then it will fail.
Reply
#5
I appreciate your reply. One of my project environments is the integration of Apache Spark. So I have to import pyspark module. Lots of python files include pyspark module. For example,

from pyspark.sql import SparkSession
Here is my dilemma. If I erase the PYTHONPATH from the settings.json file,

Error:
ModuleNotFoundError: No module named 'pyspark'
is thrown. Then I insert the PYTHONPATH, the above error occurs. I make every efforts. For exmaple, In launch.json file, I type in the following codes,

{
	"type": "python",
    "name": "Python: Current File",
    "request": "launch",
    "program": "${file}",
    "console": "integratedTerminal",
    "cwd": "${fileDirname}",
    "env": {
                "PYTHONPATH": "C:/spark-3.4.1-bin-hadoop3/python;C:/spark-3.4.1-bin-hadoop3/python/pyspark;C:/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip;C:/spark-3.4.1-bin-hadoop3/python/lib/pyspark.zip"
            }
}
But it does not works at all. Desperately need your advice.
Reply
#6
(Aug-15-2023, 06:25 AM)aupres Wrote: But it does not works at all. Desperately need your advice.
Do not give path VS Code.
This should work fom commandline before using any Editors.
If i do test install of pyspark using eg conda Installation
conda install -c conda-forge pyspark
I have conda-forge on as default.
# Activate environment  
G:\div_code\egg\ping
λ G:\miniconda3\Scripts\activate.bat tom_env
# Install pyspark
(tom_env) G:\div_code\egg\ping
λ conda install pyspark
Retrieving notices: ...working... done
.....
Test that it work.
(tom_env) G:\div_code\egg\ping
λ ptpython
>>> import pandas as pd
>>> pd.__file__
'G:\\miniconda3\\envs\\tom_env\\lib\\site-packages\\pandas\\__init__.py'

>>> from pyspark.sql import SparkSession
>>> spark = SparkSession.builder.getOrCreate()

# Testing pyspark
>>> from datetime import datetime, date
... import pandas as pd
... from pyspark.sql import Row
...
... df = spark.createDataFrame([
...     Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
...     Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
...     Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
... ])
>>> df
DataFrame[a: bigint, b: double, c: string, d: date, e: timestamp]

>>> df.columns
['a', 'b', 'c', 'd', 'e']
So now all work from commandline,without any Editors or setup in them.

For Editors i have to set this,so set PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON.
This is a OS environment setup,and not a setup in Editors.
(tom_env) G:\div_code\egg\ping
λ set PYSPARK_PYTHON=G:\miniconda3\envs\tom_env\python.exe

(tom_env) G:\div_code\egg\ping
λ set PYSPARK_DRIVER_PYTHON=G:\miniconda3\envs\tom_env\python.exe 
Then this work in VS Code with no setup in Editor,other than activate environment.
from datetime import datetime, date
import pandas as pd
from pyspark.sql import Row
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

df = spark.createDataFrame([
    Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
    Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
    Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
])

print(df) 

If you messed (base) environment just make a new one,this is important point of using Anaconda.
Do not try to fix a broken environment,just make new one.
Example,this also install new Python 3.10.8 version and notebook i a most have with jupyterlab.
conda create --name my_env -c conda-forge pyspark jupyterlab python=3.10.8 
aupres likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  My code works on Jupyter Lab/Notebook, but NOT on Visual Code Editor jst 4 1,051 Nov-15-2023, 06:56 PM
Last Post: jst
  how do I open two instances of visual studio code with the same folder? SuchUmami 3 894 Jun-26-2023, 09:40 AM
Last Post: snippsat
  Visual Studio Code NewPi 3 1,062 May-16-2023, 11:13 PM
Last Post: snippsat
  Visual Studio Code venv ibm_db error mesi1000 7 2,833 Nov-13-2022, 12:36 AM
Last Post: snippsat
  debugging help on Visual Studio Code mg24 12 2,024 Oct-02-2022, 12:18 AM
Last Post: mg24
  Problem with importing Python file in Visual Studio Code DXav 7 5,121 Jun-15-2022, 12:54 PM
Last Post: snippsat
  Visual Studio Code Intellisense for Imported Modules Not Working hockinsk 1 2,732 Apr-23-2022, 04:41 PM
Last Post: deanhystad
Photo Visual studio code unable to color syntax on python interpreter tomtom 4 6,924 Mar-02-2022, 01:23 AM
Last Post: tomtom
  compile error Visual Studio Code jamie_01 2 1,698 Jan-25-2022, 09:36 AM
Last Post: Larz60+
  Calling python from c++ in visual studio pdk5 0 2,174 May-24-2021, 10:18 AM
Last Post: pdk5

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020