Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
PySpark Coding Challenge
#1
Hello Community,

I have been presented with a challenge that I'm struggling with.

The challenge is as follows:
Write three Python functions, register them as PySpark UDF functions ans use them to produce an output dataframe.
The following is a sample of the dataset, also attached:

Output:
----------------------------------------------+-----------------------+-----------+------------------------+ |Species |Category |Period |Annual percentage change| +----------------------------------------------+-----------------------+-----------+------------------------+ |Greenfinch (Chloris chloris) |Farmland birds |(1970-2014)|-1.13 | |Siskin (Carduelis spinus) |Woodland birds |(1995-2014)|2.26 | |European shag (Phalacrocorax artistotelis) |Seabirds |(1986-2014)|-2.31 | |Mute Swan (Cygnus olor) |Water and wetland birds|(1975-2014)|1.65 | |Collared Dove (Streptopelia decaocto) |Other |(1970-2014)|5.2 | +----------------------------------------------+-----------------------+-----------+------------------------+
The requirement is to create the following three functions:

1. get_english_name - this function should get the Species column value and return the English name.

2. get_start_year - this function should get the Period column value and return the year(an integer) when data collection began.

3. get_trend - this function should get the Annual percentage change column value and return the change trend category based on the following rules:
a. Annual percentage change less than -3.00 – return 'strong decline'
b. Annual percentage change between -3.00 and -0.50 (inclusive) – return 'weak decline'
c. Annual percentage change between -0.50 and 0.50 (exclusive) – return 'no change'
d. Annual percentage change between 0.50 and 3.00 (inclusive) – return 'weak increase'
e. Annual percentage change more than 3.00 – return 'strong increase'.

The functions then need to registered as PySpark UDF functions so that they can be used in PySpark.

Any assitance greatly appreciated.
Reply
#2
Show us what you've done so far (python code), and where you are having difficulty.
Reply
#3
def get_english_name(species):
pass


def get_start_year(period):
pass


def get_trend(annual_percentage_change):
pass
Reply
#4
Come on, you can't seriously consider just writing the function signatures as actual effort, can you?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  PySpark Equivalent Code cpatte7372 0 138 Jan-14-2022, 08:59 PM
Last Post: cpatte7372
  Pyspark - my code works but I want to make it better Kevin 1 329 Dec-01-2021, 05:04 AM
Last Post: Kevin
  pyspark parallel write operation not working aliyesami 1 402 Oct-16-2021, 05:18 PM
Last Post: aliyesami
  pyspark creating temp files in /tmp folder aliyesami 1 448 Oct-16-2021, 05:15 PM
Last Post: aliyesami
Photo Integration of apache spark and Kafka on eclipse pyspark aupres 1 1,568 Feb-27-2021, 08:38 AM
Last Post: Serafim
  KafkaUtils module not found on spark 3 pyspark aupres 2 2,393 Feb-17-2021, 09:40 AM
Last Post: Larz60+
  pyspark dataframe to json without header vijz 0 763 Nov-28-2020, 05:36 PM
Last Post: vijz
  Pyspark SQL Error - mismatched input 'FROM' expecting <EOF> Ariean 3 16,592 Nov-20-2020, 03:49 PM
Last Post: Ariean
  Pyspark "mismatched input FIELDS" Mabooka 1 2,200 Aug-31-2019, 08:51 AM
Last Post: Mabooka
  Very difficult challenge for me cristfp 1 1,488 Apr-01-2019, 08:45 PM
Last Post: Yoriz

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020