Python Forum

Team,

We have three columns in a table (Mobile Number, Home Phone Number, Business Phone Number).

I am writing a Python script to identify the Valid and Invalid contact Records.

1) If Any of the Phone Number is Valid US Number - we have to mark that record as Valid Number.
2) If all the three Phone Numbers are Invalid, then we have to mark that record as Invalid contact.

I have used this code - but the code always shows Incorrect Number even one of the column has a valid number. Can someone help me on this. Thank you!

 import phonenumbers
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

# Define a UDF to parse and validate US phone numbers
def parse_phone_number(mobile_phone_number, home_phone_number, busn_phone_number):
    try:
        is_valid_mobile = False
        is_valid_home = False
        is_valid_busn = False
        
        if mobile_phone_number:
            parsed_mobile_number = phonenumbers.parse(mobile_phone_number, "US")
            if phonenumbers.is_valid_number(parsed_mobile_number):
                is_valid_mobile = True
        
        if home_phone_number:
            parsed_home_number = phonenumbers.parse(home_phone_number, "US")
            if phonenumbers.is_valid_number(parsed_home_number):
                is_valid_home = True
        
        if busn_phone_number:
            parsed_busn_number = phonenumbers.parse(busn_phone_number, "US")
            if phonenumbers.is_valid_number(parsed_busn_number):
                is_valid_busn = True
        
        if is_valid_mobile or is_valid_home or is_valid_busn:
            return "Correct Number"
        else:
            return "Incorrect Number"
    except Exception as e:
        return f"Error parsing phone number: {e}"

parse_phone_number_udf = udf(parse_phone_number, StringType())

Query = """ select * from datalabs.lab_tech_bi.KYC_dash_analysis where BUSN_PH_NBR = '5852330889' """

df = spark.sql(Query)

df = df.withColumn('Correct_number', parse_phone_number_udf(df['MOBILE_PH_NBR'], df['HOME_PH_NBR'], df['BUSN_PH_NBR']))

# Create a temporary view from the DataFrame
df.createOrReplaceTempView("final_output")

# Use SQL to create a table from the temporary view
spark.sql("""
    CREATE OR REPLACE TABLE datalabs.lab_tech_bi.final_output_table AS
    SELECT * FROM final_output
""")

display(df)

How do you know the phone number is correct? According to your program, a phone number is correct if phonenumbers.is_valid_number() says so. Are you saying that phonenumbers.is_valid_number() returns incorrect results?

I am using this link to valid the number is valid or not.

https://htmlpreview.github.io/?https://g...piled.html

In the query I am passing this Business Phone number - 5852330889.

Yes, is_valid_number() is showing as Incorrect Number. I feel like the code is not executing / passing the If conditions.

Test this:

import phonenumbers
parsed = phonenumbers.parse("5852330889", "US")
print(parsed)
print(phonenumbers.is_valid_number(parsed)

What does it output?

Your parse_phone_number function takes three numbers. What value do you provide for the other two numbers?

What happens if you add a print statement to the top of the parse_phone_number function.

def parse_phone_number(mobile_phone_number, home_phone_number, busn_phone_number):
    print(mobile_phone_number, home_phone_number, busn_phone_number)

KiranKandavalli

deanhystad

KiranKandavalli

deanhystad