"Split" file and comparison with CSV - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: "Split" file and comparison with CSV (/thread-12057.html) Pages:
1
2
|
"Split" file and comparison with CSV - morgandebray - Aug-07-2018 Hi, I'm very new to python and I need your help guys. I want to know if it's possible to "split" file (with regex ?), read the line and make a comparison and then recreate the hole file. See this, it's the kind of file I need to split. The regex would be "/##/" (so a "block" would be the first line (include) to the 5th line (exclude)) After spliting the file to get block, I'll need to compare the line of each block (so one by one) and read the line of a CSV file and compare. If a condition is true, I want to add a line of replace one precise line (but this is another story...)For now I need to "split" the file into "block", but I really don't know how to do that... I tried this : files = open(file,'r').read().split('/##/') names = ['file'+ str(num) for num in range(len(files))] for num,file in enumerate(files): open(names[num],'w').write(file)but this is not what I want (it's creating every file from any "/##/" and removing it on every file created, include the first one which doesn't have to be removed /##/PARAM/XX/YY/ZZ/X/N/N/N)" RE: "Split" file and comparison with CSV - buran - Aug-07-2018 Something like this input_file = 'input_file.txt' # '/path/to/inputfile.txt' def save_to_file(n, lines): with open('file_{}.txt'.format(n), 'w') as out_f: out_f.writelines(lines) with open(input_file) as in_file: header = next(in_file) num = 0 for line in in_file: if line.strip() == '/##/': if num: save_to_file(n=num, lines=lines) lines = [header,] num += 1 else: lines.append(line) save_to_file(n=num, lines=lines) # save the last blocklast line not needed if last block also ends with /##/
RE: "Split" file and comparison with CSV - morgandebray - Aug-07-2018 Wow, thanks for the quality of the code and for the good reaction ! But I forgot to say, the files created needs to be group at the end to recreate a new big file. And the "/##/PARAM/XX/YY/ZZ/X/N/N/N)" need to be only on top of the big file But thanks, that's a very good start ! (I was told that python community was good, but it's better than that !) I've changed some things (for the header to be on a different file and make "/##/" appeare on each created files) Now I have to read a CSV file and compare some value of each files and add lines if needed, I thought about creating an array and then compare if the line is contained in array (as I'm used to in php). To create the array, i made this : with open("pj.csv", "r") as pj: lines = pj.readlines() reader = csv.reader(lines, delimiter=';') for row in reader: pjCSV = '\t'.join(row)Is that a good idea to start or should i try a different way ? ? RE: "Split" file and comparison with CSV - buran - Aug-07-2018 I don't really understand what you try to do. Do you really need to create the files or it is just a [intermediate] step for the comparison/grouping/creating new large file? You can read the file in the memory and have list of lists or list of tuples and compare whatever you want - you don't provide specifics. Also in your last code snippet you specify ; as delimiter. Yet there are no ; in your sample file so it's not clear where it comes from using csv module to read the csv file is ok with open("pj.csv", "r") as pj: reader = csv.reader(pj, delimiter=';') pj_csv = ['\t'.join(row) for row in reader] RE: "Split" file and comparison with CSV - morgandebray - Aug-07-2018 Create file is intermediate step, I didn't mean to create files but perhaps it's the best things to do ? Because the initial file could be big (I don't know how much), maybe the memory would be full ? Then after creating files (or "blocks" in memory), i need to compare line of block and line of my CSV file (it's where the delimiter ";" come from). If the condition is true, then I add / complet the current line RE: "Split" file and comparison with CSV - buran - Aug-07-2018 I would read csv first then read one block at a time, do comparisons/changes and write to the big file. No need of intermediate files or reading full file in memory (unless you need to reorder blocks). Also - I doubt the file would be THAT big to cause memory problems even if you read it in memory RE: "Split" file and comparison with CSV - morgandebray - Aug-07-2018 I wanted to read block, but i don't know what to do, and what i found to create "block" was to create file RE: "Split" file and comparison with CSV - buran - Aug-07-2018 amend my previous example input_file = 'input_file.txt' # '/path/to/inputfile.txt' def do_comparison(block): # do comparison here print(block) with open(input_file) as in_file: header = next(in_file) block = [] for line in in_file: if block and line.strip() == '/##/': do_comparison(block=block) block = [] block.append(line) # eventually use line.strip() to remove trailing new line \n do_comparison(block) # that is for the last block
RE: "Split" file and comparison with CSV - morgandebray - Aug-07-2018 Sorry I don't get it... Could you explain more please :/ RE: "Split" file and comparison with CSV - buran - Aug-07-2018 (Aug-07-2018, 09:56 AM)morgandebray Wrote: Could you explain moreexplain what exactly? Did you look at my example? You need to write/expand do_comparison function with functionality you want implemented for comparison. |