PySpark Equivalent Code - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: PySpark Equivalent Code (/thread-36080.html) |
PySpark Equivalent Code - cpatte7372 - Jan-14-2022 Hello Community, I have coded the following logic into SQL as follows: Join very_large_dataframe to small_product_dimension_dataframe on column [B] Only join records to small_product_dimension_dataframe where O is greater then 10 Keep only Column [P] SELECT small_product_dimension_dataframe.P FROM dbo.small_product_dimension_dataframe INNER JOIN dbo.very_large_dataframe ON small_product_dimension_dataframe.B = very_large_dataframe.B WHERE small_product_dimension_dataframe.O > 10 I would like help with the equivalent code in PySpark. I have made a start withn the following: df = very_large_dataframe.join(small_product_dimension_dataframe, (very_large_dataframe.B == small_product_dimension_dataframe.B))I would like help amending the pyspark to include col P and WHERE small_product_dimension_dataframe.O > 10 |