Python Forum
count string occurrences of 2nd file in lines of first
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
count string occurrences of 2nd file in lines of first
#1
I need to generate permutation of some words (A T G C ) actually nucleotides for di-composition (eg AA AT AG AC), tri-composition (AAA AAT AAC AAG), tetra, penta etc (one at a time) and then check in the other file that contains sequences with some values the count of occurrences of each permutation. I generated the permutation list. Now I need to loop through the sequences only (splitting the sequences from values) for counting each of the permutation generated above and get the output in new file. But I'm getting the answer for only one sequence and not for the other sequences.

Logic of the programme i tried to follow is :

Generate the permutations of ATCG in a file1 (e.g. AT AG AC AA ...)
Read the generated file1 and sequence#value file (DNA_seq_val.txt)
Read the sequences and separate the sequences form values
Loop through the sequences for the permutations and print their occurrence with values (each separated with comma) in results file.
Input test file= DNA_seq_val.txt AAAATTTT#99
CCCCGGGG#77
ATATATCGCGCG#88

*Output I got is --
2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT
77 CCCCGGGG
88 ATATATCGCGCG
Output Needed is
2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT
x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,77 CCCCGGGGx
x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,88 ATATATCGCGCG
(where x= corresponding counts as in first line)

my code is below:

from itertools import product
import os

f2 = open('TRYYY', 'a')

#********Generate the permutations start********
per = product('ACGT', repeat=2)	# ATGC =nucleotides; 2= for di ntd(replace 2 with 3 fir tri ntds and so on)
f = open('myfile', 'w')
p = ""
for p in per:
    p = "".join(p)
    f.write(p + "\n")
f.close()

#********Generate the permutations ENDS********

with open('DNA_seq_val.txt', 'r+') as SEQ, open('myfile', 'r+') as TET: #open two files
	SEQ_lines = sum(1 for line in open('DNA_seq_val.txt'))		#count lines in sequences file
	#print (SEQ_lines)
	compo_lines = sum(1 for line in open('myfile'))		#count lines in composition
	#print (compo_lines)
	for lines in SEQ:
		line,val1 = lines.split("#")
		val2 = val1.rstrip('\n')
		val = str(val2)
		line = line.rstrip('\n')
		length =len(line)
		#print (line)		
		#print (val)
		LIN = line, val
		#print (LIN)
		newstr = "".join((line))
		print (newstr)
		#while True:		# infinte loop
		for PER in TET:
			#print (line)
			PER = PER.rstrip('\n')
			length2 =len(PER)
			#print (length2)
			#print (line)
#			print (PER)
			C_PER  = str(line.count(PER))
#			print (C_PER)
			for R in C_PER:
				R1 = "".join(R)
				f2.write(R1+ ",")
		f2.write(val,)
		f2.write('\t')
		f2.write(line)
		f2.write('\n')
	#exit()
Reply
#2
this is all actually...sorry for my naiveness...I'm new to programming and this forum
inputs and outputs are provided above the code...
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Need to replace a string with a file (HTML file) tester_V 1 761 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  Row Count and coloumn count Yegor123 4 1,322 Oct-18-2022, 03:52 AM
Last Post: Yegor123
  Delete multiple lines from txt file Lky 6 2,284 Jul-10-2022, 12:09 PM
Last Post: jefsummers
  Editing text between two string from different lines Paqqno 1 1,311 Apr-06-2022, 10:34 PM
Last Post: BashBedlam
  failing to print not matched lines from second file tester_V 14 6,073 Apr-05-2022, 11:56 AM
Last Post: codinglearner
  Extracting Specific Lines from text file based on content. jokerfmj 8 2,953 Mar-28-2022, 03:38 PM
Last Post: snippsat
  I want to simplify this python code into fewer lines, it's about string mandaxyz 5 2,118 Jan-15-2022, 01:28 PM
Last Post: mandaxyz
  Why does 'nothing' count as a string? BashBedlam 3 1,643 Nov-10-2021, 12:41 AM
Last Post: BashBedlam
Question [SOLVED] Delete specific characters from string lines EnfantNicolas 4 2,203 Oct-21-2021, 11:28 AM
Last Post: EnfantNicolas
  Importing a function from another file runs the old lines also dedesssse 6 2,543 Jul-06-2021, 07:04 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020