Hello Community,
I have coded the following logic into SQL as follows:
Join very_large_dataframe to small_product_dimension_dataframe on column [B]
Only join records to small_product_dimension_dataframe where O is greater then 10
Keep only Column [P]
SELECT
small_product_dimension_dataframe.P
FROM dbo.small_product_dimension_dataframe
INNER JOIN dbo.very_large_dataframe
ON small_product_dimension_dataframe.B = very_large_dataframe.B
WHERE small_product_dimension_dataframe.O > 10
I would like help with the equivalent code in PySpark.
I have made a start withn the following:
I would like help amending the pyspark to include col P and WHERE small_product_dimension_dataframe.O > 10
I have coded the following logic into SQL as follows:
Join very_large_dataframe to small_product_dimension_dataframe on column [B]
Only join records to small_product_dimension_dataframe where O is greater then 10
Keep only Column [P]
SELECT
small_product_dimension_dataframe.P
FROM dbo.small_product_dimension_dataframe
INNER JOIN dbo.very_large_dataframe
ON small_product_dimension_dataframe.B = very_large_dataframe.B
WHERE small_product_dimension_dataframe.O > 10
I would like help with the equivalent code in PySpark.
I have made a start withn the following:
1 2 |
df = very_large_dataframe.join(small_product_dimension_dataframe, (very_large_dataframe.B = = small_product_dimension_dataframe.B)) |
Larz60+ write Jan-14-2022, 10:53 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
You can use for SQL.
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
You can use for SQL.