How to link two python scripts

berthenet · Jan-26-2018, 11:49 AM

Hi Larz60+

Thanks for your help with this!

I actually found the first script on this blog: https://bioexpressblog.wordpress.com/201...asta-file/

It doesn't say there that it's slow, so I had no idea. I am working with bacterial DNA though, which are much smaller, so it's fast enough when I run my files with it. The format issue is worth a fix though (I was traying on my own to solve my problem and the format of the first script output was giving me a hard time.

I don't really know how the Bio.SeqIo.Parse works. I can try and have a look. I stopped at "it does what I wanted it to do"
The header I am working with at the moment are like this:
>NODE_1_length_340169_cov_104.531

You'll notice that the length is actually mentionned in the header, but I still want to check the length, as I will also work with data differently formatted, such as this:
>C16BOV0002_c17

Basically, headers can be very different from one project to another. The only rule is it starts with this ">" and goes to a new line before writing the sequence.
Not sure if you mean at the end of the 'length' script, or at the end of my series of scripts. In the end, I want a new multi-fasta with only the large sequences (size of more than 1000 basepairs), and a summary file stating how many contigs are in the original fasta and how many contigs are in the filtered fasta. I might try to execute the script for a list of files, and in this case, only one summary file for the whole list of files will be enough instead of one per file, but I am not there in my journey yet. Was that what you were asking?
Here is an example of data, with only two fasta in the multi-fasta, one over the limit and one under the limit (not sure how is the best way for me to transfer them to you, so I put them in a spoiler:

Hide/Show
>NODE_30_length_1090_cov_54656.2
TTATGGAATATCTATTAGAGCAAAAAAGAGATTTTACGCAATTAAAATTTAGCGATATAC
AGCAAATGAAATCAGCTTATAGCATAAGAATTTATAATATGCTACTTTGTGAATTAAAAC
AAAACAGACAAAATCTTAAAATAAATCTTTCAGTATTGCAAAATCTTTTAGAAGTTCCGA
AAAATTATGAAGAAAGATGGGCTGATTTTAATCGTTTTGTATTAAAACAAGCAGAAAAAG
ATATAAATAGCAAATCTAATTTAGTTTTATTAGATATTAAAACTTATAAAACAGGGCGTA
AAATAACAGACTTAGAGTTTATTTTTGATTATAAAAATAACGATAAGCGTATCGCACAGG
AAAAACTAAAAGAAGAAAATTTATTTAAAAAACTCAAAGAAATATTAAGTTCTTACATAG
GCAAATCAATTTATGATGATAGATTTGGCGAAATGATTATAAGTCATTACGAACATAATG
AAGAAAATAAAAAGATTTTAATTATCGCCCAGAGAAAAAGCGATGATAAATTTGTTTGCT
TTGGTGTTAAAAACTTCAAAGATATTAAAAGTTTAGAAAAGCTAAAAGATAAAGCAGAAG
AGTTGTTTTATTTAGATAAACAAAGAGTTTTAAAAGCAAAAGAAGCTCAAAAATATAGAA
ATCTTTTTAATTGATTGTATTTTAAAAATTATAAAAATAAAAGAGATATTAAAAGGCTTG
ATTGATAAAAATAATTCTTAAGCTCTAATATCTATGCTTTTTTGTGTAGAATTTAAAGAA
AGAATTTTATTAAATTCCCCTGTATTATCATCGCTAAATTTCATACCAAAAAGAATTTCT
AGCTCATCGCTTGTGCCAAATTTATTTTCCAGTAGCTTTTTTAAAAGCTCATTCATTTTA
TTATCATCTTTATAGGTTTCGCTTTTACTTTCTGCTTGTATAGGTTTAAAAGGCTTTTTT
TTGTCTTCTTCTGAAGTTTCTTTGTTATTTGTATTTTTTAAAGGATTGCTATAATCTACA
CCTTTTGCCTTTTCTGCTTCTTCTAGTGATTTTACAAACCCATCGTGTCTTTGTTTAAAA
TCAAGATATT
>NODE_31_length_906_cov_422.889
TTCTTCTTAACATCTTCTAAGATATTATTAGCTATATCACTTACTGTACTAGAAATAATA
GCTGATTCATTAGCAATTTCAACATTATCTTTAGTAGTTTGATCAATTTGAGCTACGCTA
TCATTGATTTGAGTGATACCTGCAGTTTGTTCTTTAATACTTTCTGCCATATCATTGATA
GATTGAACAAGTAAATTAGTATTAGCTTCAATTTCTGATAAAGACTTTTGAGTTCTTTCA
GCTAACTTTCTAACTTCATCAGCTACCACTGCAAAGCCTCTACCATGTTCTCCAGCACGA
GCTGCTTCAATAGCTGCATTTAAAGCTAAAAGATTGATTTGATCTGCAATATCACCTATA
ATACCTGTAACATTTTTAATCTCTTCAGATTGAGTGATAACATCACTAGTTTTAACTGAA
ACATTTTGCATAGAAGAAGTGATCTCTTCTAAAGCTGCTGCAGTTTCTTCTAAAGATTGA
GCTTGAGAATTTGAAGAAGTGGTTAAGCTTTGAACAGCAGTTTGTAATTTTCCACTTTCA
TTAGCTAAAGCATTAGCAAAGTCTGAACTTTGTTTTAGCATTTTAACTATTTCATCACCT
AAAGCATTAGTAGTTAATTCTACACTACCGCTAGCATTTTCTAATTTATTTCTAAAGTCT
AAGCTTTTGTATTCTTCAAAAATTTTATGAATAGCATTCATATCAGAACCTACTCTAGCT
TGTAAAACATCAAGAAGTTTATTTAGAACATTTTTAAGTTCAATAAGTTGTGGGTTTCTT
GGATTAGCAGTAATTCTTGCTGTTAAATTACCACCTTCTACAACTGATACGGTTTGAACT
GATTCTTTAACGGCTTGATTGTCTTGTTCTAAGCCTCTTTTAGTAGCAAGAATGTTTTCA
TTGATA

And there is something else I still need to add in my second script (or in the new script if we end up re-write one), which is sorting the multi-fasta file in decreasing size order. The files I am working with at the moment are already ordered, so my second script works, but I am aware that the way I designed my script only works for an ordered multi-fasta file.

I hope I gave enough information!

Thanks again for answering me!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Trying to us python.exe from our network to run scripts	cubangt	3	925	Aug-17-2023, 07:53 PM Last Post: deanhystad
	Link scripts from a different folder	Extra	3	1,477	May-11-2022, 08:34 PM Last Post: snippsat
	How do I link the virtual environment of that project to the 3.9.2 version of python?	Bryant11	1	1,413	Feb-26-2022, 11:15 AM Last Post: Larz60+
	I can't open a link with Selenium in Python	jao	0	1,423	Jan-30-2022, 04:21 AM Last Post: jao
	Parsing link from html tags with Python	Melcu54	0	1,636	Jun-14-2021, 09:25 AM Last Post: Melcu54
	How to link Sublime Text 3 Build system to Python 3.9 Using Windows 10	Fanman001	2	4,676	Mar-04-2021, 03:09 PM Last Post: martpogs
	Running python scripts from github etc	pacmyc	7	3,794	Mar-03-2021, 10:26 PM Last Post: pacmyc
	How to skip LinkedIn signup link using python script?	Mangesh121	0	1,821	Aug-26-2020, 01:22 PM Last Post: Mangesh121
	Reading SQL scripts from excel file and run it using python	saravanatn	2	2,633	Aug-23-2020, 04:49 PM Last Post: saravanatn
	No Scripts File present after python installation	ag2207	5	4,977	Jul-30-2020, 11:11 AM Last Post: buran

How to link two python scripts

User Panel Messages

Announcements