Jan-29-2020, 01:41 PM
Hello Everybody
I wrote this code which takes 2 files removes any letters inside of them keeping only the phone numbers, then removes any duplicates and compares the files to find the common content.
The Code is this:
I tried .rsplit, .rpartition trying to drop the .csv extension from the initial filename but it doesn't work.
Can anyone help?
I wrote this code which takes 2 files removes any letters inside of them keeping only the phone numbers, then removes any duplicates and compares the files to find the common content.
The Code is this:
import re import csv filename_list=[] file1 = input("Please input file1: ") filename_list.append(file1) file2 = input("Please input file2: ") filename_list.append(file2) duplicate_list=[] def clean_file(filename): with open (filename,'r') as f: list1=f.readlines() for ch in list1: result=re.sub('[^0-9]','',ch) with open(('{}_clean.csv').format(filename),'a+') as cl: if len(result)<10: result=result.strip() else: cl.write(result + '\n') def clean_duplicates(filename): lines_seen = set() with open(('{}_clean_dup.csv').format(filename),'w') as rf: duplicate_list.append(rf.name) for line in open(('{}_clean.csv').format(filename),'r'): if line not in lines_seen: rf.write(line) lines_seen.add(line) def find_common(): comp_file1 = open(duplicate_list[0], "r") comp_file2 = open(duplicate_list[1], "r") result = open("results.csv", "a") list1 = comp_file1.readlines() list2 = comp_file2.readlines() for i in list1: for j in list2: if i==j: result.write(i) comp_file1.close() comp_file2.close() result.close() for filename in filename_list: clean_file(filename) clean_duplicates(filename) find_common()So the code works but I have a slight problem. The produced files get filenames like this: filename.csv_clean.csv and filename.csv_clean_dup.csv.
I tried .rsplit, .rpartition trying to drop the .csv extension from the initial filename but it doesn't work.
Can anyone help?