(Aug-15-2023, 06:25 AM)aupres Wrote: But it does not works at all. Desperately need your advice.Do not give path VS Code.
This should work fom commandline before using any Editors.
If i do test install of pyspark using eg conda Installation
conda install -c conda-forge pysparkI have conda-forge on as default.
# Activate environment G:\div_code\egg\ping λ G:\miniconda3\Scripts\activate.bat tom_env # Install pyspark (tom_env) G:\div_code\egg\ping λ conda install pyspark Retrieving notices: ...working... done .....Test that it work.
(tom_env) G:\div_code\egg\ping λ ptpython >>> import pandas as pd >>> pd.__file__ 'G:\\miniconda3\\envs\\tom_env\\lib\\site-packages\\pandas\\__init__.py' >>> from pyspark.sql import SparkSession >>> spark = SparkSession.builder.getOrCreate() # Testing pyspark >>> from datetime import datetime, date ... import pandas as pd ... from pyspark.sql import Row ... ... df = spark.createDataFrame([ ... Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)), ... Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)), ... Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0)) ... ]) >>> df DataFrame[a: bigint, b: double, c: string, d: date, e: timestamp] >>> df.columns ['a', 'b', 'c', 'd', 'e']So now all work from commandline,without any Editors or setup in them.
For Editors i have to set this,so set
PYSPARK_PYTHON
and PYSPARK_DRIVER_PYTHON
.This is a
OS environment
setup,and not a setup in Editors.(tom_env) G:\div_code\egg\ping λ set PYSPARK_PYTHON=G:\miniconda3\envs\tom_env\python.exe (tom_env) G:\div_code\egg\ping λ set PYSPARK_DRIVER_PYTHON=G:\miniconda3\envs\tom_env\python.exeThen this work in VS Code with no setup in Editor,other than activate environment.
from datetime import datetime, date import pandas as pd from pyspark.sql import Row from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame([ Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)), Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)), Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0)) ]) print(df)
If you messed
(base)
environment just make a new one,this is important point of using Anaconda.Do not try to fix a broken environment,just make new one.
Example,this also install new Python 3.10.8 version and notebook i a most have with jupyterlab.
conda create --name my_env -c conda-forge pyspark jupyterlab python=3.10.8