Feb-10-2022, 10:41 PM
I'm trying to design a function that will insert both a different "oldTable", string, and column name for each iteration. The "withColumn" calculation below works fine, but "withColumnRenamed" and the "where" line do not.
What I want, for example with newTable1, is "oldVar2" renamed to "string1_newVar2" and any rows with null values in the "oldVar_dropNull" variable dropped.
What I want, for example with newTable1, is "oldVar2" renamed to "string1_newVar2" and any rows with null values in the "oldVar_dropNull" variable dropped.
import pyspark.sql.functions as F def functionName(x,y,z): return x.withColumn("newVar1", F.when(F.col("oldVar1") > 0, x.oldVar1*100/x.oldVar1)\ .otherwise(0)) \ .withColumnRenamed("oldVar2", (y,"_newVar2")) \ .where(F.col(z).isNotNull()) newTable1 = functionName(oldTable1,"string1","oldVar_dropNull") newTable2 = functionName(oldTable2,"string2","oldVar_dropNull")Some sample data:
import pandas as pd df = {'oldVar1':['18.50', '649.27', '523.52'], 'oldVar2':['24.56', '4564.56', '34.45'], 'oldVar_dropNull':['12.54', '656.89', '0'] } oldTable1 = pd.DataFrame(df) print(oldTable1)