Jan-16-2019, 08:40 PM
I am working in a small project where I need to extract a nucleotide portion from a bacterial genome. The bacterial genome has 30000000 characters and need to extract from nucleotide 94442 to 95255. I have no programming experience but I am learning. I used the following code to perform the extraction
>>> first = open(r"C:\Users\cepo\Desktop\Python\AvinosumDSM180.txt","r")
>>> first.seek(94443)
94443
>>> sep = first.read(95255-94443)
>>> print(sep)
This code seemed to work and I got the following result from it
CCGACTGCCATGTCCTCGG
GCGTTTGCCCGCGACCCATCTGCTGCTGCATCGCAACGGCGCCGCGCCCTGGTTCATCCTGGTTCCTGAA
ACCGATCTGGCCAACCTCCTGGATCTGCCGGCCGCGCACCGTGATGCCGTCCTAGCCGACTGCACGCGCG
TTTCGGATGCACTGGGCACGCTGGGTTATCCCAAGATCAACGTCGCCTGGATCGGTAATCTGGTGCCACA
GCTCCACATCCATGTCATCGGGCGTCGTCCCGGCGATGCCTGTTGGCCGCGACCGGTGTGGGGGCATCTG
CCGGCAGAGCGGGACTATGCCGAGCACGAAATCACGGCGCTCCGCGCGGCGGTCCTGGATTGAGAGCGCC
GGCTCCATCGTCCACTGACCTGTTCAGACGCAACGGAGGAACCGCGCGTTCTGACCGGCCATCACCCCAG
CTCGCCATCGAGATAGAACCAGCGCCCGTGCTCGCGCACGAAGCGACTGCGCTCCTGGAGGCGCTGGGCA
CGGCCCTGGAGCTTGGAGCGGGCCACGAACGTCACCCAGCCCTCCTGGTCCGTTGCGCCTCCGGCTTCGG
TGCTCAGGATCTTGAGACCGAGCCAGCGAAGTCCCGGCTCCAGGGTCAGCGTGGCCGGACGGGTTGTCGG
ATGCCAGGTGGCGAGCAGATAGTCAGCCTGCCCGGTGGCAAAGGCGCTGTAGCGCGAGCGCATCAGGGCC
TCGGCTGTCGGTGCGATGGTACGGGCGGACAGATGAGGACCGCAGCAGTCGTCGAAAGGGCGGCCGGAGC
CGCAGAGACAG
The problem is that 95255-94443 is equal to 812 characters so I should
have gotten 812 characters extraction and instead, I got 800 only. I
am at a complete loss as to why is python discarding 12 characters, which
I need to be able to find the protein this DNA sequence encodes for.
Please advice.
>>> first = open(r"C:\Users\cepo\Desktop\Python\AvinosumDSM180.txt","r")
>>> first.seek(94443)
94443
>>> sep = first.read(95255-94443)
>>> print(sep)
This code seemed to work and I got the following result from it
CCGACTGCCATGTCCTCGG
GCGTTTGCCCGCGACCCATCTGCTGCTGCATCGCAACGGCGCCGCGCCCTGGTTCATCCTGGTTCCTGAA
ACCGATCTGGCCAACCTCCTGGATCTGCCGGCCGCGCACCGTGATGCCGTCCTAGCCGACTGCACGCGCG
TTTCGGATGCACTGGGCACGCTGGGTTATCCCAAGATCAACGTCGCCTGGATCGGTAATCTGGTGCCACA
GCTCCACATCCATGTCATCGGGCGTCGTCCCGGCGATGCCTGTTGGCCGCGACCGGTGTGGGGGCATCTG
CCGGCAGAGCGGGACTATGCCGAGCACGAAATCACGGCGCTCCGCGCGGCGGTCCTGGATTGAGAGCGCC
GGCTCCATCGTCCACTGACCTGTTCAGACGCAACGGAGGAACCGCGCGTTCTGACCGGCCATCACCCCAG
CTCGCCATCGAGATAGAACCAGCGCCCGTGCTCGCGCACGAAGCGACTGCGCTCCTGGAGGCGCTGGGCA
CGGCCCTGGAGCTTGGAGCGGGCCACGAACGTCACCCAGCCCTCCTGGTCCGTTGCGCCTCCGGCTTCGG
TGCTCAGGATCTTGAGACCGAGCCAGCGAAGTCCCGGCTCCAGGGTCAGCGTGGCCGGACGGGTTGTCGG
ATGCCAGGTGGCGAGCAGATAGTCAGCCTGCCCGGTGGCAAAGGCGCTGTAGCGCGAGCGCATCAGGGCC
TCGGCTGTCGGTGCGATGGTACGGGCGGACAGATGAGGACCGCAGCAGTCGTCGAAAGGGCGGCCGGAGC
CGCAGAGACAG
The problem is that 95255-94443 is equal to 812 characters so I should
have gotten 812 characters extraction and instead, I got 800 only. I
am at a complete loss as to why is python discarding 12 characters, which
I need to be able to find the protein this DNA sequence encodes for.
Please advice.