Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
merge files
#1
I have several files were column A is a tagname made of letters and numbers (e.g. Rob_0001,...) and column B is a value for column A (e.g. Rob_0001 \t 89). I am trying to merge them but not all tagnames are equal in all files (e.g. some files have no Rob_0005).
I have split it into two scripts: one that generates files with gaps so that all column As are the same and now I am going to try to merge. But it seems like I am making the scripts unnecessarily complicated with 100 lines of code! ... is there a better way? Thanks. Here is the code that introduces gaps so that all tagneames in all files are similar

import os #this imports the operating system module that lets me look at files and directories
filenames = os.listdir('c:/python27/C/U') #this is the list of files in that directory
for filename in filenames:#here it loops through each file in the list
infile = open(os.path.join('c:/python27/C/U', filename), 'r') #this opens the file and makes it readable
outputfilename=filename+"_gaps.txt"#this creates a new file name for the modified files that I am about to create
outfile= open(os.path.join('c:/python27/C/U', outputfilename), 'w') #this opens these new files and makes them writable
locusnumber = 0
linenumber =0 #this is the beginning of a loop; it needs to be set back to zero for each file
newlocusnumber=1
for line in infile:#this loops through each line in the file,
if linenumber==0:#first line is the header so just write it
outfile.write(line)
elif locusnumber<10:#for lines 1-9
elements=line.split('\t')#this splits lines into different items
elementnumber=len(elements)
if elements[0]==('Cabther_A000'+ str(locusnumber)):#this is why only for lines 1-9, so that the total digit nubmer equals 4
outfile.write(line)
else:
word=elements[0]
number=int(word[9:])
difference=number-locusnumber
locusnumber = locusnumber + difference
outfile.write('\n'*difference + line)

elif locusnumber<100:#lines 10-99
elements=line.split('\t')#this splits lines into different items
if elements[0]==('Cabther_A00'+ str(locusnumber)):#total digit number equals 4
outfile.write(line)
else:
word=elements[0]
number=int(word[9:])
difference=number-locusnumber
locusnumber = locusnumber + difference
outfile.write('\n'*difference + line)
elif locusnumber<1000:
elements=line.split('\t')#this splits lines into different items
if elements[0]==('Cabther_A0'+ str(locusnumber)):
outfile.write(line)
else:
word=elements[0]
number=int(word[9:])
difference=number-locusnumber
locusnumber = locusnumber + difference
outfile.write('\n'*difference + line)
elif locusnumber<2274:
elements=line.split('\t')#this splits lines into different items
if elements[0]==('Cabther_A'+ str(locusnumber)):
outfile.write(line)
else:
word=elements[0]
number=int(word[9:])
difference=number-locusnumber
locusnumber = locusnumber + difference
outfile.write('\n'*difference + line)
elif locusnumber>=2274:
if newlocusnumber<0:
elements=line.split('\t')#this splits lines into different items
if elements[0]==('Cabther_B000'+ str(newlocusnumber)):
outfile.write(line)
newlocusnumber=newlocusnumber+1
else:
word=elements[0]
number=int(word[9:])
difference=number-newlocusnumber
newlocusnumber = newlocusnumber + difference
outfile.write('\n'*difference + line)
newlocusnumber=newlocusnumber+1

elif newlocusnumber<100:
elements=line.split('\t')#this splits all headers into different items
if elements[0]==('Cabther_B00'+ str(newlocusnumber)):
outfile.write(line)
newlocusnumber=newlocusnumber+1
else:
word=elements[0]
number=int(word[9:])
difference=number-locusnumber
newlocusnumber = newlocusnumber + difference
outfile.write('\n'*difference + line)
newlocusnumber=newlocusnumber+1
elif newlocusnumber<800:
elements=line.split('\t')#this splits all lines into different items
if elements[0]==('Cabther_B0'+ str(newlocusnumber)):
outfile.write(line)
newlocusnumber=newlocusnumber+1
else:
word=elements[0]
number=int(word[9:])
difference=number-newlocusnumber
newlocusnumber = newlocusnumber + difference
outfile.write('\n'*difference + line)
newlocusnumber=newlocusnumber+1

linenumber = linenumber +1
locusnumber = locusnumber +1#this makes sure that the next line in the loop will not be the first one

infile.close()
outfile.close()
Reply
#2
Please repost your code with python tags. See the BBCode link in my signature below for instructions.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
import os  #this imports the operating system module that lets me look at files and directories
filenames = os.listdir('c:/python27/C/U') #this is the list of files in that directory
for filename in filenames:#here it loops through each file in the list
	infile = open(os.path.join('c:/python27/C/U', filename), 'r') #this opens the file and makes it readable 
	outputfilename=filename+"_gaps.txt"#this creates a new file name for the modified files that I am about to create
	outfile= open(os.path.join('c:/python27/C/U', outputfilename), 'w') #this opens these new files and makes them writable
	locusnumber = 0
	linenumber =0 #this is the beginning of a loop; it needs to be set back to zero for each file
	newlocusnumber=1
	for line in infile:#this loops through each line in the file,
		if linenumber==0:#first line is the header so just write it
			outfile.write(line)
		elif locusnumber<10:#for lines 1-9
			elements=line.split('\t')#this splits lines into different items
			elementnumber=len(elements)
			if elements[0]==('Cabther_A000'+ str(locusnumber)):#this is why only for lines 1-9, so that the total digit nubmer equals 4
				outfile.write(line)
			else:
				word=elements[0]
				number=int(word[9:])
				difference=number-locusnumber
				locusnumber = locusnumber + difference
				outfile.write('\n'*difference + line)
				
		elif locusnumber<100:#lines 10-99
			elements=line.split('\t')#this splits lines into different items
			if elements[0]==('Cabther_A00'+ str(locusnumber)):#total digit number equals 4
				outfile.write(line)
			else:
				word=elements[0]
				number=int(word[9:])
				difference=number-locusnumber
				locusnumber = locusnumber + difference
				outfile.write('\n'*difference + line)
		elif locusnumber<1000:
			elements=line.split('\t')#this splits lines into different items
			if elements[0]==('Cabther_A0'+ str(locusnumber)):
				outfile.write(line)
			else:
				word=elements[0]
				number=int(word[9:])
				difference=number-locusnumber
				locusnumber = locusnumber + difference
				outfile.write('\n'*difference + line)
		elif locusnumber<2274:
			elements=line.split('\t')#this splits lines into different items
			if elements[0]==('Cabther_A'+ str(locusnumber)):
				outfile.write(line)
			else:
				word=elements[0]
				number=int(word[9:])
				difference=number-locusnumber
				locusnumber = locusnumber + difference
				outfile.write('\n'*difference + line)
		elif locusnumber>=2274:
			if newlocusnumber<0:
				elements=line.split('\t')#this splits lines into different items
				if elements[0]==('Cabther_B000'+ str(newlocusnumber)):
					outfile.write(line)
					newlocusnumber=newlocusnumber+1
				else:
					word=elements[0]
					number=int(word[9:])
					difference=number-newlocusnumber
					newlocusnumber = newlocusnumber + difference
					outfile.write('\n'*difference + line)
					newlocusnumber=newlocusnumber+1
				
			elif newlocusnumber<100:
				elements=line.split('\t')#this splits all headers into different items
				if elements[0]==('Cabther_B00'+ str(newlocusnumber)):
					outfile.write(line)
					newlocusnumber=newlocusnumber+1
				else:
					word=elements[0]
					number=int(word[9:])
					difference=number-locusnumber
					newlocusnumber = newlocusnumber + difference
					outfile.write('\n'*difference + line)
					newlocusnumber=newlocusnumber+1
			elif newlocusnumber<800:
				elements=line.split('\t')#this splits all lines into different items
				if elements[0]==('Cabther_B0'+ str(newlocusnumber)):
					outfile.write(line)
					newlocusnumber=newlocusnumber+1
				else:
					word=elements[0]
					number=int(word[9:])
					difference=number-newlocusnumber
					newlocusnumber = newlocusnumber + difference
					outfile.write('\n'*difference + line)
					newlocusnumber=newlocusnumber+1
		
		linenumber = linenumber +1
		locusnumber = locusnumber +1#this makes sure that the next line in the loop will not be the first one
		
infile.close()
outfile.close()
Reply
#4
You open a bunch of files (two for every time through the for loop), but you only ever close the last two that you opened (you close them outside the for loop).  Things like this are why there's a with block, so whatever you open is closed for you automatically when you can't use it anymore.  But if you're using python2.7 as your path indicates, you might not have access to it.

But you do have access to the csv module, so you don't need to do things like split the line on tabs.

You can probably delete most of those blocks, if you generate the id of the line a little better.  Instead of locusnumber, and newlocusnumber, turning it into something much simpler, it looks like the format is Cabther_[AB]NNNN, where A/B is determined by whether or not locusnumber is less than the magic number 2274.  So something like this:
def get_row_label(locusnumber):
    tag = "A" if locusnumber < 2274 else "B"
    return "Cabther_{0}{1:04.0f}".format(tag, locusnumber)

# for file in files
    locusnumber = 0
    for line in infile:
        label = get_row_label(locusnumber)
        elements = line.split("\t")
        if elements[0] == label:
            outfile.write(line)
        else:
            #etc
            outfile.write("\n" * difference + line)
And that's it, no more of the same block 7 times in a row.
Reply
#5
Oh my,

This is beautiful and so much more elegant and it worked so well :). Thank you so much.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  merge all xlsb files into csv mg24 0 327 Nov-13-2023, 08:25 AM
Last Post: mg24
  merge two csv files into output.csv using Subprocess mg24 9 1,744 Dec-11-2022, 09:58 PM
Last Post: Larz60+
  Merge all json files in folder after filtering deneme2 10 2,324 Sep-18-2022, 10:32 AM
Last Post: deneme2
  Merge htm files with shutil library (TypeError: 'module' object is not callable) Melcu54 5 1,573 Aug-28-2022, 07:11 AM
Last Post: Melcu54
  How to merge all the files in a directory in to one file sutra 3 2,627 Dec-10-2020, 12:09 AM
Last Post: sutra
  Merge JSON Files Ugo 4 4,606 Aug-20-2020, 06:25 AM
Last Post: ndc85430
  How to read multiple csv files and merge data rajeshE 0 1,943 Mar-28-2020, 04:01 PM
Last Post: rajeshE
  error merge text files ledgreve 3 2,682 Nov-18-2019, 12:41 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020