(Apr-25-2023, 07:16 AM)siddhi1919 Wrote: We are looking for a solution in pyspark where we can compare/match the one col4 value with entire table col3 value.Next time if post give it a try with some code,to show some effort and not just post the the task to.
Something like this.
import pandas as pd data = { 'Col1': [1, 2, 3, 4], 'Col2': ['A', 'B', 'C', 'D'], 'Col3': [101, 102, 103, 104], 'Col4': ['arn:aws:savingsplans::104:savingsplan/f001', '', 'arn:aws:savingsplans::101:savingsplan/f002', ''] } df = pd.DataFrame(data) # Use regex to extract 104 and 101 from Col4 df['Col4_extracted'] = df['Col4'].str.extract(r':(\d{3}):') # Check if 104 appears in Col3 match = df['Col3'] == int(df['Col4_extracted'].iloc[0]) print(df['Col3'][match])
Output:3 104
Name: Col3, dtype: int64
Spark provides a createDataFrame(pandas_dataframe)
method to convert pandas to Spark DataFrame.