Python Forum

I'm trying to design a function that will insert both a different "oldTable", string, and column name for each iteration. The "withColumn" calculation below works fine, but "withColumnRenamed" and the "where" line do not.

What I want, for example with newTable1, is "oldVar2" renamed to "string1_newVar2" and any rows with null values in the "oldVar_dropNull" variable dropped.

import pyspark.sql.functions as F

def functionName(x,y,z):
    return x.withColumn("newVar1", F.when(F.col("oldVar1") > 0, x.oldVar1*100/x.oldVar1)\
                                                    .otherwise(0)) \
               .withColumnRenamed("oldVar2", (y,"_newVar2")) \
               .where(F.col(z).isNotNull())
        
newTable1 = functionName(oldTable1,"string1","oldVar_dropNull")
newTable2 = functionName(oldTable2,"string2","oldVar_dropNull")

Some sample data:

import pandas as pd

df = {'oldVar1':['18.50', '649.27', '523.52'],
      'oldVar2':['24.56', '4564.56', '34.45'],
      'oldVar_dropNull':['12.54', '656.89', '0']
     }
 
oldTable1 = pd.DataFrame(df)
print(oldTable1)

DrData82