Python Forum
"Split" file and comparison with CSV
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
"Split" file and comparison with CSV
#1
Hi, I'm very new to python and I need your help guys.

I want to know if it's possible to "split" file (with regex ?), read the line and make a comparison and then recreate the hole file.

See this, it's the kind of file I need to split. The regex would be "/##/" (so a "block" would be the first line (include) to the 5th line (exclude))

Output:
/##/PARAM/XX/YY/ZZ/X/N/N/N /##/ /donnee1/XXXXXXXX /donnee2/A ... /##/ /donnee1/YYYYY /donnee2/B ...
After spliting the file to get block, I'll need to compare the line of each block (so one by one) and read the line of a CSV file and compare. If a condition is true, I want to add a line of replace one precise line (but this is another story...)

For now I need to "split" the file into "block", but I really don't know how to do that...

I tried this :
files = open(file,'r').read().split('/##/')
names = ['file'+ str(num) for num in range(len(files))]
for num,file in enumerate(files):
    open(names[num],'w').write(file)
but this is not what I want (it's creating every file from any "/##/" and removing it on every file created, include the first one which doesn't have to be removed /##/PARAM/XX/YY/ZZ/X/N/N/N)"
Reply
#2
Something like this
input_file = 'input_file.txt' # '/path/to/inputfile.txt'

def save_to_file(n, lines):
    with open('file_{}.txt'.format(n), 'w') as out_f:
        out_f.writelines(lines)

with open(input_file) as in_file:
    header = next(in_file)
    num = 0
    for line in in_file:
        if line.strip() == '/##/':
            if num:
                save_to_file(n=num, lines=lines)
            lines = [header,]
            num += 1
        else:
            lines.append(line)
    save_to_file(n=num, lines=lines) # save the last block
last line not needed if last block also ends with /##/
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Wow, thanks for the quality of the code and for the good reaction !

But I forgot to say, the files created needs to be group at the end to recreate a new big file.
And the "/##/PARAM/XX/YY/ZZ/X/N/N/N)" need to be only on top of the big file

But thanks, that's a very good start !

(I was told that python community was good, but it's better than that !)

I've changed some things (for the header to be on a different file and make "/##/" appeare on each created files)

Now I have to read a CSV file and compare some value of each files and add lines if needed, I thought about creating an array and then compare if the line is contained in array (as I'm used to in php).

To create the array, i made this :

with open("pj.csv", "r") as pj:
    lines = pj.readlines()
    reader = csv.reader(lines, delimiter=';')
    for row in reader:
        pjCSV = '\t'.join(row)
Is that a good idea to start or should i try a different way ? ?
Reply
#4
I don't really understand what you try to do.
Do you really need to create the files or it is just a [intermediate] step for the comparison/grouping/creating new large file?
You can read the file in the memory and have list of lists or list of tuples and compare whatever you want - you don't provide specifics.
Also in your last code snippet you specify ; as delimiter. Yet there are no ; in your sample file so it's not clear where it comes from
using csv module to read the csv file is ok

with open("pj.csv", "r") as pj:
    reader = csv.reader(pj, delimiter=';')
    pj_csv = ['\t'.join(row) for row in reader]
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
Create file is intermediate step, I didn't mean to create files but perhaps it's the best things to do ? Because the initial file could be big (I don't know how much), maybe the memory would be full ?

Then after creating files (or "blocks" in memory), i need to compare line of block and line of my CSV file (it's where the delimiter ";" come from).
If the condition is true, then I add / complet the current line
Reply
#6
I would read csv first then read one block at a time, do comparisons/changes and write to the big file. No need of intermediate files or reading full file in memory (unless you need to reorder blocks). Also - I doubt the file would be THAT big to cause memory problems even if you read it in memory
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#7
I wanted to read block, but i don't know what to do, and what i found to create "block" was to create file
Reply
#8
amend my previous example

input_file = 'input_file.txt' # '/path/to/inputfile.txt'

def do_comparison(block):
    # do comparison here
    print(block)
    

with open(input_file) as in_file:
    header = next(in_file)
    block = []
    for line in in_file:
        if block and line.strip() == '/##/':
            do_comparison(block=block)
            block = []
        block.append(line) # eventually use line.strip() to remove trailing new line \n
    do_comparison(block) # that is for the last block
Output:
['/##/\n', '/donnee1/XXXXXXXX\n', '/donnee2/A\n'] ['/##/\n', '/donnee1/YYYYY\n', '/donnee2/B\n']
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#9
Sorry I don't get it... Could you explain more please :/
Reply
#10
(Aug-07-2018, 09:56 AM)morgandebray Wrote: Could you explain more
explain what exactly? Did you look at my example? You need to write/expand do_comparison function with functionality you want implemented for comparison.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to "tee" (=split) output to screen and into file? pstein 6 1,292 Jun-24-2023, 08:00 AM
Last Post: Gribouillis
  Split pdf in pypdf based upon file regex standenman 1 1,975 Feb-03-2023, 12:01 PM
Last Post: SpongeB0B
Photo String comparison in a csv file in Python Pandas fleafy 2 1,108 Nov-18-2022, 09:38 PM
Last Post: fleafy
  How to split file by same values from column from imported CSV file? Paqqno 5 2,705 Mar-24-2022, 05:25 PM
Last Post: Paqqno
  [split] Results of this program in an excel file eisamabodian 1 1,543 Feb-11-2022, 03:18 PM
Last Post: snippsat
  split txt file data on the first column value shantanu97 2 2,380 Dec-29-2021, 05:03 PM
Last Post: DeaD_EyE
  [split] Help- converting file with pyton script eltomassito 6 3,192 Jul-02-2021, 05:29 PM
Last Post: snippsat
  Split Characters As Lines in File quest_ 3 2,470 Dec-28-2020, 09:31 AM
Last Post: quest_
  [split] How to convert the CSV text file into a txt file Pinto94 5 3,256 Dec-23-2020, 08:04 AM
Last Post: ndc85430
  Split and sort input file aawaleh 4 2,920 Apr-10-2020, 09:59 PM
Last Post: aawaleh

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020