Jan-26-2018, 10:18 AM
Hi everyone !
This is my first post here. I am originally a biologist but I started doing bioinformatics. I have very basic knowledge in programming, in perl, python and shell.
I found online a small python script which prints a list of all contigs in a multi-fasta file, with their length.
- How can I combine these two scripts into one ? I actually don't need the intermediate list of lengths, I only need to use it as an input for my second script, and as I will have quite a lot of files to process, I would quite like to avoid the accumulation of intermediate files and having to run two scripts instead of just one.
I would really like to understand how it works, more than just having the solution, as I think this is something very basic that I should know how to do.
Thanks in advance for your time and help. If there is anything I should explain differently in order for you to help me, please let me know.
This is my first post here. I am originally a biologist but I started doing bioinformatics. I have very basic knowledge in programming, in perl, python and shell.
I found online a small python script which prints a list of all contigs in a multi-fasta file, with their length.
#!/usr/bin/python from Bio import SeqIO import sys cmdargs = str(sys.argv) for seq_record in SeqIO.parse(str(sys.argv[1]), "fasta"): output_line = '%s\t%i' % \ (seq_record.id, len(seq_record)) print(output_line)I used this script to store the result in a file (that I named TestTailleContigs1.txt for now), and I wrote a script that uses this file and the original multi-fasta file to cut off all the contigs with a length < 1000 and store the result in a new file.
#!/usr/bin/python from __future__ import division length_file = open("TestTailleContigs1.txt") contig_file = open("2013_1056H.contigs.fa") output_file = open("2013_1056H.contigs_filtered.fa","w") Contigs_over_1000 = 0 Contigs = 0 FirstContigToTrim = 0 for line in length_file: column = line.split("\t") contigsize = int(column[1]) if contigsize > 1000: Contigs_over_1000 += 1 Contigs += 1 else : Contigs += 1 if FirstContigToTrim == 0: FirstContigToTrim = column[0] print("The first contig to filter is " + str(FirstContigToTrim)) print("The number of contigs in the original file is " + str(Contigs)) print("The number of contigs remaining after filtering is " + str(Contigs_over_1000)) testcontig = 0 for line in contig_file: present = line.count(str(FirstContigToTrim)) testcontig = testcontig + present if testcontig == 0: output_file.write(line) length_file.close contig_file.close output_file.closeMy question is:
- How can I combine these two scripts into one ? I actually don't need the intermediate list of lengths, I only need to use it as an input for my second script, and as I will have quite a lot of files to process, I would quite like to avoid the accumulation of intermediate files and having to run two scripts instead of just one.
I would really like to understand how it works, more than just having the solution, as I think this is something very basic that I should know how to do.
Thanks in advance for your time and help. If there is anything I should explain differently in order for you to help me, please let me know.
