"Split" file and comparison with CSV

"Split" file and comparison with CSV - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: "Split" file and comparison with CSV (/thread-12057.html)

Pages: 1 2

"Split" file and comparison with CSV - morgandebray - Aug-07-2018

Hi, I'm very new to python and I need your help guys.

I want to know if it's possible to "split" file (with regex ?), read the line and make a comparison and then recreate the hole file.

See this, it's the kind of file I need to split. The regex would be "/##/" (so a "block" would be the first line (include) to the 5th line (exclude))

Output:/##/PARAM/XX/YY/ZZ/X/N/N/N
/##/
/donnee1/XXXXXXXX
/donnee2/A
...
/##/
/donnee1/YYYYY
/donnee2/B
...

After spliting the file to get block, I'll need to compare the line of each block (so one by one) and read the line of a CSV file and compare. If a condition is true, I want to add a line of replace one precise line (but this is another story...)

For now I need to "split" the file into "block", but I really don't know how to do that...

I tried this :

files = open(file,'r').read().split('/##/')
names = ['file'+ str(num) for num in range(len(files))]
for num,file in enumerate(files):
    open(names[num],'w').write(file)

but this is not what I want (it's creating every file from any "/##/" and removing it on every file created, include the first one which doesn't have to be removed /##/PARAM/XX/YY/ZZ/X/N/N/N)"

RE: "Split" file and comparison with CSV - buran - Aug-07-2018

Something like this

input_file = 'input_file.txt' # '/path/to/inputfile.txt'

def save_to_file(n, lines):
    with open('file_{}.txt'.format(n), 'w') as out_f:
        out_f.writelines(lines)

with open(input_file) as in_file:
    header = next(in_file)
    num = 0
    for line in in_file:
        if line.strip() == '/##/':
            if num:
                save_to_file(n=num, lines=lines)
            lines = [header,]
            num += 1
        else:
            lines.append(line)
    save_to_file(n=num, lines=lines) # save the last block

last line not needed if last block also ends with /##/

RE: "Split" file and comparison with CSV - morgandebray - Aug-07-2018

Wow, thanks for the quality of the code and for the good reaction !

But I forgot to say, the files created needs to be group at the end to recreate a new big file.
And the "/##/PARAM/XX/YY/ZZ/X/N/N/N)" need to be only on top of the big file

But thanks, that's a very good start !

(I was told that python community was good, but it's better than that !)

I've changed some things (for the header to be on a different file and make "/##/" appeare on each created files)

Now I have to read a CSV file and compare some value of each files and add lines if needed, I thought about creating an array and then compare if the line is contained in array (as I'm used to in php).

To create the array, i made this :

with open("pj.csv", "r") as pj:
    lines = pj.readlines()
    reader = csv.reader(lines, delimiter=';')
    for row in reader:
        pjCSV = '\t'.join(row)

Is that a good idea to start or should i try a different way ? ?

RE: "Split" file and comparison with CSV - buran - Aug-07-2018

I don't really understand what you try to do.
Do you really need to create the files or it is just a [intermediate] step for the comparison/grouping/creating new large file?
You can read the file in the memory and have list of lists or list of tuples and compare whatever you want - you don't provide specifics.
Also in your last code snippet you specify ; as delimiter. Yet there are no ; in your sample file so it's not clear where it comes from
using csv module to read the csv file is ok

with open("pj.csv", "r") as pj:
    reader = csv.reader(pj, delimiter=';')
    pj_csv = ['\t'.join(row) for row in reader]

RE: "Split" file and comparison with CSV - morgandebray - Aug-07-2018

Create file is intermediate step, I didn't mean to create files but perhaps it's the best things to do ? Because the initial file could be big (I don't know how much), maybe the memory would be full ?

Then after creating files (or "blocks" in memory), i need to compare line of block and line of my CSV file (it's where the delimiter ";" come from).
If the condition is true, then I add / complet the current line

RE: "Split" file and comparison with CSV - buran - Aug-07-2018

I would read csv first then read one block at a time, do comparisons/changes and write to the big file. No need of intermediate files or reading full file in memory (unless you need to reorder blocks). Also - I doubt the file would be THAT big to cause memory problems even if you read it in memory

RE: "Split" file and comparison with CSV - morgandebray - Aug-07-2018

I wanted to read block, but i don't know what to do, and what i found to create "block" was to create file

RE: "Split" file and comparison with CSV - buran - Aug-07-2018

amend my previous example

input_file = 'input_file.txt' # '/path/to/inputfile.txt'

def do_comparison(block):
    # do comparison here
    print(block)
    

with open(input_file) as in_file:
    header = next(in_file)
    block = []
    for line in in_file:
        if block and line.strip() == '/##/':
            do_comparison(block=block)
            block = []
        block.append(line) # eventually use line.strip() to remove trailing new line \n
    do_comparison(block) # that is for the last block

Output:['/##/\n', '/donnee1/XXXXXXXX\n', '/donnee2/A\n']
['/##/\n', '/donnee1/YYYYY\n', '/donnee2/B\n']

RE: "Split" file and comparison with CSV - morgandebray - Aug-07-2018

Sorry I don't get it... Could you explain more please :/

RE: "Split" file and comparison with CSV - buran - Aug-07-2018

(Aug-07-2018, 09:56 AM)morgandebray Wrote: Could you explain more

explain what exactly? Did you look at my example? You need to write/expand do_comparison function with functionality you want implemented for comparison.