Python Forum
Help with output from if statement
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help with output from if statement
#1
Hi,

I recently started learning Python programming and develop my skills by working on some various "script" to help me in my work as biologist.

I have the following file :
Quote: CruSTS5_GC_30000 AUGUSTUS gene 13036 15467 0.24 - . g4
CruSTS5_GC_30000 AUGUSTUS transcript 13036 15467 0.24 - . g4.t1
CruSTS5_GC_30000 AUGUSTUS stop_codon 13036 13038 . - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS terminal 13036 13498 0.57 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS internal 13555 14512 0.97 - 2 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS internal 14722 14816 0.96 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS initial 14953 15467 0.59 - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS intron 13499 13554 1 - . transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS intron 14513 14721 0.81 - . transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS intron 14817 14952 0.99 - . transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 13039 13498 0.57 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 13555 14512 0.97 - 2 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 14722 14816 0.96 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 14953 15467 0.59 - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS start_codon 15465 15467 . - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS gene 15900 17819 0.36 - . g5
CruSTS5_GC_30000 AUGUSTUS transcript 16909 17819 0.19 - . g5.t1
CruSTS5_GC_30000 AUGUSTUS stop_codon 16909 16911 . - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS terminal 16909 17176 0.27 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS internal 17232 17345 0.99 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS internal 17404 17492 1 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS internal 17549 17669 1 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS initial 17728 17819 0.69 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17177 17231 0.99 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17346 17403 1 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17493 17548 1 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17670 17727 1 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 16912 17176 0.27 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17232 17345 0.99 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17404 17492 1 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17549 17669 1 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17728 17819 0.69 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS start_codon 17817 17819 . - 0 transcript_id "g5.t1"; gene_id "g5";

It is a .gtf file with DNA data and I would like to extract some specific data from it.
I have seen there is a script to manage gtf file but for learning purpose, I would like to work on my personal script.

I want to extract lines with "start_codon" and "stop_codon", combine each successive lines, which can be (start_codon + stop_codon or stop_codon + start_codon), then run a small if statement to tell me the orientation (start --> stop or stop <-- start) and generate a small table, potentially as csv file including the name of the gene, its orientation and the position of the stop and start codon.

What I have done as beginner, is to open my .gtf file and treat it as any .txt file, read the different lines, remove the characters that would be a problem (; and ") and convert them as lists, selecting elements that I want to keep based on their index and print out "start_codon and "stop_codon".

import re

DataFile = "MyGTFFile.gtf"

with open (DataFile, "r") as GCData:
    Data = GCData.readlines()
        
    for Lines in Data:
        Lines = Lines.strip() #Remove return to line at the end
        Lines = re.sub("\s+", "\t", Lines) #Replace multiple spaces by a tabulation
        Lines = re.sub(";", "", Lines) #Replace ; by nothing ("")
        Lines = re.sub('"', "", Lines) #Replace " by nothing ("")

        if re.findall(("start_codon|stop_codon"), Lines): #sorting using "|" as "or"
            Lines = Lines.split("\t") #Convert string to list
            IndexToKeep = [2, 3, 4, 9] #List of index to keep
            Lines = [Index for Index in Lines if Lines.index(Index) in IndexToKeep]
            if "start_codon" in Lines:
                StartLine = Lines
                print(StartLine)
            else:
                StopLine = Lines
                print(StopLine)
I have several questions to solve my challenge:

- Am I right in my process and converting them as lists or should I use a different approach?

- With my small script, I would like to combine the successive lists "start_codon" with "stop_codon" or "stop_codon" with "start_codon". I would like to merge them in the order they appear in the file because it would give me the orientation and data with be more easy to analyse after that. I could not find any approach to merge, two outputs obtained each one from a different if statement. What would be the best solution?

- After generating this unique line, am I right if I plan to run several if statements using value index to extract data and generate a summary .csv file?

Thank you in advance.
Reply
#2
Is this what you are trying to do?
file = 'MyGTFFile.gtf'

mylist = []

with open(file, 'r') as data:
    lines = data.readlines()
    for line in lines:
        if 'start_codon' or 'stop_codon' in line:
            mylist.append(line.replace('"', '').replace(';', '').strip('\n'))

for item in mylist:
    print(item)
Output:
CruSTS5_GC_30000 AUGUSTUS gene 13036 15467 0.24 - . g4 CruSTS5_GC_30000 AUGUSTUS transcript 13036 15467 0.24 - . g4.t1 CruSTS5_GC_30000 AUGUSTUS stop_codon 13036 13038 . - 0 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS terminal 13036 13498 0.57 - 1 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS internal 13555 14512 0.97 - 2 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS internal 14722 14816 0.96 - 1 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS initial 14953 15467 0.59 - 0 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS intron 13499 13554 1 - . transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS intron 14513 14721 0.81 - . transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS intron 14817 14952 0.99 - . transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS CDS 13039 13498 0.57 - 1 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS CDS 13555 14512 0.97 - 2 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS CDS 14722 14816 0.96 - 1 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS CDS 14953 15467 0.59 - 0 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS start_codon 15465 15467 . - 0 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS gene 15900 17819 0.36 - . g5 CruSTS5_GC_30000 AUGUSTUS transcript 16909 17819 0.19 - . g5.t1 CruSTS5_GC_30000 AUGUSTUS stop_codon 16909 16911 . - 0 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS terminal 16909 17176 0.27 - 1 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS internal 17232 17345 0.99 - 1 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS internal 17404 17492 1 - 0 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS internal 17549 17669 1 - 1 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS initial 17728 17819 0.69 - 0 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS intron 17177 17231 0.99 - . transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS intron 17346 17403 1 - . transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS intron 17493 17548 1 - . transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS intron 17670 17727 1 - . transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS CDS 16912 17176 0.27 - 1 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS CDS 17232 17345 0.99 - 1 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS CDS 17404 17492 1 - 0 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS CDS 17549 17669 1 - 1 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS CDS 17728 17819 0.69 - 0 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS start_codon 17817 17819 . - 0 transcript_id g5.t1 gene_id g5

Nevermind, Just seen that I didn't get the specified lines
Now only specified lines are stored in a list
file = 'MyGTFFile.gtf'

mylist = []

with open(file, 'r') as data:
    lines = data.readlines()
    for line in lines:
        if 'start_codon' in line or 'stop_codon' in line:
            mylist.append(line.replace('"', '').replace(';', '').strip('\n'))

for item in mylist:
    print(item)
Output:
CruSTS5_GC_30000 AUGUSTUS stop_codon 13036 13038 . - 0 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS start_codon 15465 15467 . - 0 transcript_id g4.t1 gene_id g4 CruSTS5_GC_30000 AUGUSTUS stop_codon 16909 16911 . - 0 transcript_id g5.t1 gene_id g5 CruSTS5_GC_30000 AUGUSTUS start_codon 17817 17819 . - 0 transcript_id g5.t1 gene_id g5
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags


Reply
#3
@menator01: there is no actual need to use .readline (and additional memory). One can directly iterate over fileobject (for line in data:). By analyzing underlying data if-condition can be simplified as well: if "_codon" in line:. Printing out list can be done little bit shorter and without repeating 'item': print(*mylist, sep='\n')
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#4
Hi menator01,

Thanks you for your help.

With my current script, my output is the following:
Output:
['stop_codon', '13036', '13038', 'g4.t1'] ['start_codon', '15465', '15467', 'g4.t1'] ['stop_codon', '16909', '16911', 'g5.t1'] ['start_codon', '17817', '17819', 'g5.t1'] ['stop_codon', '18965', '18967', 'g6.t2'] ['start_codon', '22307', '22309', 'g6.t2'] ['start_codon', '22846', '22848', 'g7.t1'] ['stop_codon', '24171', '24173', 'g7.t1']
Now, I would like to combine the first line with the second line and the third line with the fourth line, and so on with all the "start_codon" and "stop_codon" lines to get an output like this:

Output:
['stop_codon', '13036', '13038', 'g4.t1', 'start_codon', '15465', '15467', 'g4.t1'] ['stop_codon', '16909', '16911', 'g5.t1', 'start_codon', '17817', '17819', 'g5.t1'] ['stop_codon', '18965', '18967', 'g6.t2', 'start_codon', '22307', '22309', 'g6.t2'] ['start_codon', '22846', '22848', 'g7.t1', 'stop_codon', '24171', '24173', 'g7.t1']
or, merging the identical values (gX.tX)

Output:
['stop_codon', '13036', '13038', 'g4.t1', 'start_codon', '15465', '15467'] ['stop_codon', '16909', '16911', 'g5.t1', 'start_codon', '17817', '17819'] ['stop_codon', '18965', '18967', 'g6.t2', 'start_codon', '22307', '22309'] ['start_codon', '22846', '22848', 'g7.t1', 'stop_codon', '24171', '24173']
And the next step would be to generate a small text / output and .csv file based on the list index and get something like:

Output:
Your gene g4.t1 is anti-sens (<--), starts at position 15467 and and ends at 13036 Your gene g5.t1 is anti-sens (<--), starts at position 16909 and and ends at 17819 Your gene g6.t1 is anti-sens (<--), starts at position 18965 and and ends at 22309 Your gene g7.t1 is sens (-->), starts at position 22846 and and ends at 24173
As I recently learning with python, based on the structure/formatting of my .gtf file, and how I would like to export them, I thought that formatting my data as list would be the best approach. Am I right? or converting them as string would be more easy to manipulate them for export?

The main challenge I am now facing, is how to append two lines ( start_codon with stop_codon), when each output were obtained from a if loop (if start_codon = StartLine, else it is a StopLine).

I think that in a first time, I would like to try by myself to code the part to generate the text and .csv file but I would definitely appreciate help to merge "start/stop_codon" or "stop/start_codon" when they are both output from the same if statement.

As beginner in Python programming, I am always open to any remarks or ideas, especially on my coding approach.

Thanks again for your help.
Reply
#5
(Jan-17-2022, 11:21 AM)fgaascht Wrote: Now, I would like to combine the first line with the second line and the third line with the fourth line, and so on with all the "start_codon" and "stop_codon" lines to get an output like this:
For this is zip() common to use.
>>> new_lst = zip(lst[0::2], lst[1::2])
# Or fancier 
#new_lst = zip(*(iter(lst),) * 2)
>>> new_lst
<zip object at 0x0000020471CE7080>
>>> for item in new_lst:
...     print(item)    
...     
(['stop_codon', '13036', '13038', 'g4.t1'], ['start_codon', '15465', '15467', 'g4.t1'])
(['stop_codon', '16909', '16911', 'g5.t1'], ['start_codon', '17817', '17819', 'g5.t1'])
(['stop_codon', '18965', '18967', 'g6.t2'], ['start_codon', '22307', '22309', 'g6.t2'])
(['start_codon', '22846', '22848', 'g7.t1'], ['stop_codon', '24171', '24173', 'g7.t1'])
So now need to join list together,this can be do with + or extend.
>>> l1 = [1, 2]
>>> l2 = [3, 4]
>>> l1 + l2
[1, 2, 3, 4]
>>> l1.extend(l2)
>>> l1
[1, 2, 3, 4]
(Jan-17-2022, 11:21 AM)fgaascht Wrote: or, merging the identical values (gX.tX)
Like this to preserve the ordering.
>>> lst = ['stop_codon', '13036', '13038', 'g4.t1', 'start_codon', '15465', '15467', 'g4.t1']
>>> list(dict.fromkeys(lst))
['stop_codon', '13036', '13038', 'g4.t1', 'start_codon', '15465', '15467']
Can do it like because from Python 3.7--> are dictionaries guaranteed to keep order.
Unlike set() which still is unordered.
>>> set(lst)
{'15467', '13038', '13036', 'start_codon', '15465', 'g4.t1', 'stop_codon'} 
Reply
#6
from operator import itemgetter
from itertools import pairwise


def unquote(text):
    return text.replace("'", "").replace('"', "")


def combine_codons(data):

    # using itemgetter to access to the `fields`
    # itemgetter return a callable
    # calling the callable with an object, will return the seclected elements of the object
    # itemgetter(1)(some_list) -> will return the second element from `some_list`
    get_first = itemgetter(0, 1, 2, 3, 4, -3)
    get_second = itemgetter(2, 3, 4, -3)
    # negative indeicies starting on the right side of the list
    # -1 is the last element in the list
    # -2 is the second last element ...

    buffer = []
    for line in data.splitlines():
        row = line.split()
        if len(row) == 12 and row[2] in ("start_codon", "stop_codon"):
            buffer.append(row)

    if len(buffer) % 2 != 0:
        # what should happen, if your data is incomplete?
        print("WARNING: Count of start_codon + stop_codon is odd")

    results = []
    # itertools.pairwise was introduced with Python 3.10
    # if you're not able to use it, you can use more_itertools instead
    for first, second in pairwise(buffer):
        # get_first and get_second is the itemgetter
        # the * will unpack the elements
        # you can do it more then once in a list
        combined_row = [*get_first(first), *get_second(second)]

        # optional
        # unquoting the 6th element and the last element
        # 1st element is at index 0
        # so 6th element is at index 5
        # counting starts with 0
        combined_row[5] = unquote(combined_row[5])
        combined_row[-1] = unquote(combined_row[-1])

        # combined result is ready, put them into results
        results.append(combined_row)

    return results


# data as string
data = """CruSTS5_GC_30000 AUGUSTUS gene 13036 15467 0.24 - . g4
CruSTS5_GC_30000 AUGUSTUS transcript 13036 15467 0.24 - . g4.t1
CruSTS5_GC_30000 AUGUSTUS stop_codon 13036 13038 . - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS terminal 13036 13498 0.57 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS internal 13555 14512 0.97 - 2 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS internal 14722 14816 0.96 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS initial 14953 15467 0.59 - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS intron 13499 13554 1 - . transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS intron 14513 14721 0.81 - . transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS intron 14817 14952 0.99 - . transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 13039 13498 0.57 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 13555 14512 0.97 - 2 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 14722 14816 0.96 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 14953 15467 0.59 - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS start_codon 15465 15467 . - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS gene 15900 17819 0.36 - . g5
CruSTS5_GC_30000 AUGUSTUS transcript 16909 17819 0.19 - . g5.t1
CruSTS5_GC_30000 AUGUSTUS stop_codon 16909 16911 . - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS terminal 16909 17176 0.27 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS internal 17232 17345 0.99 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS internal 17404 17492 1 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS internal 17549 17669 1 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS initial 17728 17819 0.69 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17177 17231 0.99 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17346 17403 1 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17493 17548 1 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17670 17727 1 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 16912 17176 0.27 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17232 17345 0.99 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17404 17492 1 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17549 17669 1 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17728 17819 0.69 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS start_codon 17817 17819 . - 0 transcript_id "g5.t1"; gene_id "g5";"""

if __name__ == "__main__":
    results = combine_codons(data)

    for result in results:
        # result is a list
        # you can convert lists to a string with spaces between elements
        result_text = " ".join(result)
        print(result_text)
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#7
I have to make assumptions what is desired result as my understanding of this subject is quite hazy. I would try to collect data into dictionary instead of list. Why? Better readability and dictionaries in modern Python are guaranteed to keep insertion order.

So I would:

- read file line by line
- process line if there is '_codon' by:
- splitting line (sample data indicates, that _codon lines have similar structure),
- picking and converting needed values
- adding values to dictionary

In code it could look like:

sequences = dict()

with open('dna_data', 'r') as f:
    for line in f:
        if '_codon' in line:
            record = line.split()
            end = record[2]
            values = (int(record[3]), int(record[4]))
            identity = record[9].strip(';').strip('"')
            
            try:
                sequences[identity][end] = values 
            except KeyError:
                sequences[identity] = {end: values}

print(sequences)
Output:
{'g4.t1': {'stop_codon': (13036, 13038), 'start_codon': (15465, 15467)}, 'g5.t1': {'stop_codon': (16909, 16911), 'start_codon': (17817, 17819)}}
Now I can iterate over dictionary and output data I want. As mentioned earlier - ordering of stop and start will be by insertion i.e. first encountered is first and second encountered is second. In sample data on both cases stop_codon was first. If I need one value then I can use min and max respectively.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question Common understanding of output processing with conditional statement neail 6 897 Sep-17-2023, 03:58 PM
Last Post: neail
  Calculation using output of if statement StephenBeckman 3 2,029 Feb-07-2020, 10:19 PM
Last Post: jefsummers
  Passing print output into another print statement Pleiades 6 3,176 Sep-08-2019, 02:37 PM
Last Post: Pleiades
  Unexpected output: if statement CabbageMan 1 1,779 Sep-04-2019, 04:12 PM
Last Post: ThomasL
  Trying to code backwords in if statement for different output in some scenarios skrivver99 1 2,450 Dec-03-2018, 01:32 AM
Last Post: Windspar

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020