Python Forum

Full Version: data validation with specific regular expression
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
This post is to find a way to check if data matches with specific regular expression

import pandas as pd 
 # Create first data frame
df1 = pd.DataFrame({
'key1': ['A', 'B', 'C', 'D', 'TST5 - 123'], 'key2': ['W', 'X', 'Y', 'Z' ,
'T2ST - 353'], 'value1': [1, 2, 3, 4, 'TST - 303'],  
'value2': [5, 6, 7, 8, 'TST - 103']})  

df2 = pd.DataFrame({
'key3': ['A', 'B', 'C', 'E', 'TST - 363'],  'key4': ['W', 'X', 'Z', 'Y' , 'TST - 373'],  
'value4': [9, 10, 11, 'A', 'TST - 123 - TST - 456 - TST -999'], 
'value5': [13, 14, 15, 16, 'TST - 109']})  

df1=df1.join(df2)
df1
data validation needs to be done on   the 5 column , the expected format is   "alphanumeric - number" or it could be "string - number" like valid values are : TST - 363, T2ST - 353, TST - 303, TST - 363,TST - 373 ..... in that case here the record that doesnt match the regular expression is - 'TST - 123 - TST - 456 - TST -999' This record should get outputted