Python Forum
How to set PYTHONPATH in Visual Studio Code?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to set PYTHONPATH in Visual Studio Code?
#1
Hello!
Belows are my dev environment,

1
2
3
4
OS : Windows 11
python : Anaconda3
Apache Spark : 3.4.1
IDE : Visual Studio Code 1.18.1
And I try to integrate apache spark pyspark into VS code. So I set the python path into settings.json file of vs code.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
"python.defaultInterpreterPath": "C:\\Anaconda3\\python.exe",
"python.condaPath": "C:\\Anaconda3\\Scripts\\conda.exe",
"terminal.integrated.env.windows": {
    "PYTHONPATH": "C:/spark-3.4.1-bin-hadoop3/python;C:/spark-3.4.1-bin-hadoop3/python/pyspark;C:/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip;C:/spark-3.4.1-bin-hadoop3/python/lib/pyspark.zip"
},
"python.autoComplete.extraPaths": [
        "C:\\spark-3.4.1-bin-hadoop3\\python",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\pyspark",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\py4j-0.10.9.7-src.zip",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\pyspark.zip"
],
"python.analysis.extraPaths": [
        "C:\\spark-3.4.1-bin-hadoop3\\python",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\pyspark",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\py4j-0.10.9.7-src.zip",
        "C:\\spark-3.4.1-bin-hadoop3\\python\\lib\\pyspark.zip"
]
The pyspark works without errors with these configuration. But the issue happens when I import pandas of python default module.

1
import pandas as pd
This simple and basic expression throws the errors like below,

Error:
import pandas as pd File "C:\spark-3.4.1-bin-hadoop3\python\pyspark\pandas\__init__.py", line 29, in <module> from pyspark.pandas.missing.general_functions import MissingPandasLikeGeneralFunctions File "C:\spark-3.4.1-bin-hadoop3\python\pyspark\pandas\__init__.py", line 34, in <module> require_minimum_pandas_version() File "C:\spark-3.4.1-bin-hadoop3\python\pyspark\sql\pandas\utils.py", line 37, in require_minimum_pandas_version if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version): ^^^^^^^^^^^^^^^^^^ AttributeError: partially initialized module 'pandas' has no attribute '__version__' (most likely due to a circular import)
As you see, the imported pandas module is not python module, but pyspark.pandas module. So the code brings the error. I set the python default interpreter to anaconda3 at the top line of the settings.json file. But it still brings the errors. Any reply will be thanksful.
Reply
#2
Do not mess with Path setting in VS Code,click down in right corner and choice right environment(Python interpreter).
See that a conda environment look like this 3.10.1(tom_env: conda).
[Image: b08f5m.png]
So this will work if eg start a blank eg VSCodium,with no settings.json setup.
Also from command line always activate a environment using Anaconda or MiniConda,eg (base) or should make your own.
Here i activate my own (tom_env) and start VSCode and VSCodium.
[Image: u4BrpA.png]
Reply
#3
Thanks for your reply. In my vs code, I have only one python interpreter - conda base. And I always activate the conda. But the same error is thrown.
   
Reply
#4
Delete all you have shown in first in settings.json .
in your error messgage it load Pandas from:
Error:
C:\spark-3.4.1-bin-hadoop3\python\pyspark\pandas\__init__.py
This is wrong for you with (base) Anaconda it should load from:
1
C:\Anaconda\Lib\site-packages\pandas\...
If i test eg with a settings.json with no config setup at all.
1
2
3
4
5
6
{  
    "workbench.iconTheme": "simple-icons",
    "workbench.productIconTheme": "fluent-icons",
    "window.zoomLevel": 1,
    "workbench.colorTheme": "Visual Studio Dark",
}
Test code
1
2
3
4
5
import sys
import pandas as pd
 
print(sys.executable)
print(pd.__file__)
So running code over eg in 3.10.1(tom_env: conda).
Output:
G:\miniconda3\envs\tom_env\python.exe G:\miniconda3\envs\tom_env\lib\site-packages\pandas\__init__.py
See that it ponit to same root folders.
If change to eg Python 3.11.3 from python.org.
Output:
C:\Python311\python.exe C:\Python311\Lib\site-packages\pandas\__init__.py
See the same pattern,ovet to eg 3.10.8(sci_env: conda).
Output:
G:\miniconda3\envs\sci_env\python.exe G:\miniconda3\envs\sci_env\lib\site-packages\pandas\__init__.py
Pandas do of course need to be installed in all envioment,there is no sharing or loading from other Paths,then it will fail.
Reply
#5
I appreciate your reply. One of my project environments is the integration of Apache Spark. So I have to import pyspark module. Lots of python files include pyspark module. For example,

1
from pyspark.sql import SparkSession
Here is my dilemma. If I erase the PYTHONPATH from the settings.json file,

Error:
ModuleNotFoundError: No module named 'pyspark'
is thrown. Then I insert the PYTHONPATH, the above error occurs. I make every efforts. For exmaple, In launch.json file, I type in the following codes,

1
2
3
4
5
6
7
8
9
10
11
{
    "type": "python",
    "name": "Python: Current File",
    "request": "launch",
    "program": "${file}",
    "console": "integratedTerminal",
    "cwd": "${fileDirname}",
    "env": {
                "PYTHONPATH": "C:/spark-3.4.1-bin-hadoop3/python;C:/spark-3.4.1-bin-hadoop3/python/pyspark;C:/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip;C:/spark-3.4.1-bin-hadoop3/python/lib/pyspark.zip"
            }
}
But it does not works at all. Desperately need your advice.
Reply
#6
(Aug-15-2023, 06:25 AM)aupres Wrote: But it does not works at all. Desperately need your advice.
Do not give path VS Code.
This should work fom commandline before using any Editors.
If i do test install of pyspark using eg conda Installation
1
conda install -c conda-forge pyspark
I have conda-forge on as default.
1
2
3
4
5
6
7
8
# Activate environment 
G:\div_code\egg\ping
λ G:\miniconda3\Scripts\activate.bat tom_env
# Install pyspark
(tom_env) G:\div_code\egg\ping
λ conda install pyspark
Retrieving notices: ...working... done
.....
Test that it work.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(tom_env) G:\div_code\egg\ping
λ ptpython
>>> import pandas as pd
>>> pd.__file__
'G:\\miniconda3\\envs\\tom_env\\lib\\site-packages\\pandas\\__init__.py'
 
>>> from pyspark.sql import SparkSession
>>> spark = SparkSession.builder.getOrCreate()
 
# Testing pyspark
>>> from datetime import datetime, date
... import pandas as pd
... from pyspark.sql import Row
...
... df = spark.createDataFrame([
...     Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
...     Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
...     Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
... ])
>>> df
DataFrame[a: bigint, b: double, c: string, d: date, e: timestamp]
 
>>> df.columns
['a', 'b', 'c', 'd', 'e']
So now all work from commandline,without any Editors or setup in them.

For Editors i have to set this,so set PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON.
This is a OS environment setup,and not a setup in Editors.
1
2
3
4
5
(tom_env) G:\div_code\egg\ping
λ set PYSPARK_PYTHON=G:\miniconda3\envs\tom_env\python.exe
 
(tom_env) G:\div_code\egg\ping
λ set PYSPARK_DRIVER_PYTHON=G:\miniconda3\envs\tom_env\python.exe
Then this work in VS Code with no setup in Editor,other than activate environment.
1
2
3
4
5
6
7
8
9
10
11
12
13
from datetime import datetime, date
import pandas as pd
from pyspark.sql import Row
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
 
df = spark.createDataFrame([
    Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
    Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
    Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
])
 
print(df)

If you messed (base) environment just make a new one,this is important point of using Anaconda.
Do not try to fix a broken environment,just make new one.
Example,this also install new Python 3.10.8 version and notebook i a most have with jupyterlab.
1
conda create --name my_env -c conda-forge pyspark jupyterlab python=3.10.8
aupres likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Visual Studio Code help aaronrousch 4 3,718 Jan-25-2025, 05:55 AM
Last Post: ndc85430
  I cannot create a virtual environment on visual studio code using python Willem_Aucamp316 2 2,677 Nov-27-2024, 02:20 PM
Last Post: menator01
  Code Completion in Visual Studio for External Libraries mjakov 0 562 Aug-25-2024, 02:48 PM
Last Post: mjakov
  My code works on Jupyter Lab/Notebook, but NOT on Visual Code Editor jst 4 3,990 Nov-15-2023, 06:56 PM
Last Post: jst
  how do I open two instances of visual studio code with the same folder? SuchUmami 3 3,803 Jun-26-2023, 09:40 AM
Last Post: snippsat
  Visual Studio Code NewPi 3 2,006 May-16-2023, 11:13 PM
Last Post: snippsat
  Visual Studio Code venv ibm_db error mesi1000 7 4,819 Nov-13-2022, 12:36 AM
Last Post: snippsat
  debugging help on Visual Studio Code mg24 12 3,953 Oct-02-2022, 12:18 AM
Last Post: mg24
  Problem with importing Python file in Visual Studio Code DXav 7 9,420 Jun-15-2022, 12:54 PM
Last Post: snippsat
  Visual Studio Code Intellisense for Imported Modules Not Working hockinsk 1 4,189 Apr-23-2022, 04:41 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020