Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
PySpark Equivalent Code
#1
Hello Community,

I have coded the following logic into SQL as follows:

Join very_large_dataframe to small_product_dimension_dataframe on column [B]
Only join records to small_product_dimension_dataframe where O is greater then 10
Keep only Column [P]

SELECT
small_product_dimension_dataframe.P
FROM dbo.small_product_dimension_dataframe
INNER JOIN dbo.very_large_dataframe
ON small_product_dimension_dataframe.B = very_large_dataframe.B
WHERE small_product_dimension_dataframe.O > 10

I would like help with the equivalent code in PySpark.

I have made a start withn the following:

df = very_large_dataframe.join(small_product_dimension_dataframe,
                                                        (very_large_dataframe.B == small_product_dimension_dataframe.B))
I would like help amending the pyspark to include col P and WHERE small_product_dimension_dataframe.O > 10
Larz60+ write Jan-14-2022, 10:53 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.

You can use for SQL.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Equivalent Python code from VBA Mishal0488 1 633 Jul-05-2023, 10:43 AM
Last Post: carecavoador
  PySpark Coding Challenge cpatte7372 4 5,961 Jun-25-2023, 12:56 PM
Last Post: prajwal_0078
  Pyspark dataframe siddhi1919 3 1,183 Apr-25-2023, 12:39 PM
Last Post: snippsat
  pyspark help lokesh 0 736 Jan-03-2023, 04:34 PM
Last Post: lokesh
  How to iterate Groupby in Python/PySpark DrData82 2 2,709 Feb-05-2022, 09:59 PM
Last Post: DrData82
  Pyspark - my code works but I want to make it better Kevin 1 1,745 Dec-01-2021, 05:04 AM
Last Post: Kevin
  pyspark parallel write operation not working aliyesami 1 1,652 Oct-16-2021, 05:18 PM
Last Post: aliyesami
  pyspark creating temp files in /tmp folder aliyesami 1 4,820 Oct-16-2021, 05:15 PM
Last Post: aliyesami
  KafkaUtils module not found on spark 3 pyspark aupres 2 7,257 Feb-17-2021, 09:40 AM
Last Post: Larz60+
  pyspark dataframe to json without header vijz 0 1,917 Nov-28-2020, 05:36 PM
Last Post: vijz

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020