Jun-25-2023, 12:56 PM
(This post was last modified: Jun-25-2023, 03:13 PM by Gribouillis.)
import findspark findspark.init() from pyspark.sql import SparkSession from pyspark.sql.types import StringType,IntegerType,StructType,StructField,FloatType from pyspark.sql.functions import when, col, udf spark = SparkSession.builder.appName("exp").getOrCreate() sc = spark.sparkContext @udf(returnType=StringType()) def get_english_name(val): return val[0:val.index(" (")] @udf(returnType=IntegerType()) def get_start_year(val): return int(val[1:5]) @udf(returnType=StringType()) def get_trend(x): if x < -3.00: return "strong decline" elif -3.00 < x < -0.50: return "weak decline" elif -0.50 <x<0.50: return "no change" else: return "strong increase" info = [("Greenfinch (Chloris chloris)","Farmland birds","(1970-2014)",-1.13),("Siskin (Carduelis spinus)","Woodland birds","(1995-2014)",2.26), ("European shag (Phalacrocorax artistotelis)","Seabirds","(1986-2014)",-2.31),("Mute Swan (Cygnus olor)","Water and wetland birds","(1975-2014)",1.65) ,("Collared Dove (Streptopelia decaocto)","other","(1970-2014)",5.2)] schema1 = StructType( [StructField("Species", StringType()), StructField("Category", StringType()), StructField("Period", StringType()), StructField("Annual_percentage_change", FloatType()) ]) rdd = sc.parallelize(info) data = spark.createDataFrame(rdd, schema=schema1) data2 = data.withColumn("English_Name", get_english_name(col("Species")))\ .withColumn("start_yearn", get_start_year(col("Period")))\ .withColumn("Trend", get_trend(col("Annual_percentage_change"))) data2.show() spark.stop()
Gribouillis write Jun-25-2023, 03:13 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.