Python Forum

Hello,
The dataframe below (Works fine) uses the read col_Expr to apply expression to the columns where required.

from pyspark.sql.functions import expr
actual_df = actual_df.withColumn(col_name, expr(col_Expr))

For example,
col_name = Client
col_Expr = upper(Client)

Then the actual_df will return the upper case of the Client column values...
Question:
I am not sure how to get the above python to work if the col_Expr is more complicated than just upper or lower.
For example, if it is to format a date field, i.e. to_date("LoadDate", "MMM dd yyyy") then simply putting this to_date into the col_Expr will give an error when it is trying to apply the expression in the above python code.

error:
== SQL ==
to_date("LoadDate"

basically I can get the code to work as follows but I want to make sure the code works without the else for any expression applied

if (row["ColumnName"] != "LoadDate"):
                    actual_df = actual_df.withColumn(col_name, expr(col_Expr))
                else:                                        
                   actual_df = actual_df.withColumn(col_name, date_format(col_name, col_Expr))

Any suggestions?

Thanks

Not sure I completely understand the question, but it sounds like apply may work. df.apply() allows you to define a function, as long or complex as you like, then apply it to the dataframe.

arkiboys

jefsummers