Python Forum
Help with output from if statement
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help with output from if statement
#1
Hi,

I recently started learning Python programming and develop my skills by working on some various "script" to help me in my work as biologist.

I have the following file :
Quote: CruSTS5_GC_30000 AUGUSTUS gene 13036 15467 0.24 - . g4
CruSTS5_GC_30000 AUGUSTUS transcript 13036 15467 0.24 - . g4.t1
CruSTS5_GC_30000 AUGUSTUS stop_codon 13036 13038 . - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS terminal 13036 13498 0.57 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS internal 13555 14512 0.97 - 2 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS internal 14722 14816 0.96 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS initial 14953 15467 0.59 - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS intron 13499 13554 1 - . transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS intron 14513 14721 0.81 - . transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS intron 14817 14952 0.99 - . transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 13039 13498 0.57 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 13555 14512 0.97 - 2 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 14722 14816 0.96 - 1 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS CDS 14953 15467 0.59 - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS start_codon 15465 15467 . - 0 transcript_id "g4.t1"; gene_id "g4";
CruSTS5_GC_30000 AUGUSTUS gene 15900 17819 0.36 - . g5
CruSTS5_GC_30000 AUGUSTUS transcript 16909 17819 0.19 - . g5.t1
CruSTS5_GC_30000 AUGUSTUS stop_codon 16909 16911 . - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS terminal 16909 17176 0.27 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS internal 17232 17345 0.99 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS internal 17404 17492 1 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS internal 17549 17669 1 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS initial 17728 17819 0.69 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17177 17231 0.99 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17346 17403 1 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17493 17548 1 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS intron 17670 17727 1 - . transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 16912 17176 0.27 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17232 17345 0.99 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17404 17492 1 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17549 17669 1 - 1 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS CDS 17728 17819 0.69 - 0 transcript_id "g5.t1"; gene_id "g5";
CruSTS5_GC_30000 AUGUSTUS start_codon 17817 17819 . - 0 transcript_id "g5.t1"; gene_id "g5";

It is a .gtf file with DNA data and I would like to extract some specific data from it.
I have seen there is a script to manage gtf file but for learning purpose, I would like to work on my personal script.

I want to extract lines with "start_codon" and "stop_codon", combine each successive lines, which can be (start_codon + stop_codon or stop_codon + start_codon), then run a small if statement to tell me the orientation (start --> stop or stop <-- start) and generate a small table, potentially as csv file including the name of the gene, its orientation and the position of the stop and start codon.

What I have done as beginner, is to open my .gtf file and treat it as any .txt file, read the different lines, remove the characters that would be a problem (; and ") and convert them as lists, selecting elements that I want to keep based on their index and print out "start_codon and "stop_codon".

import re

DataFile = "MyGTFFile.gtf"

with open (DataFile, "r") as GCData:
    Data = GCData.readlines()
        
    for Lines in Data:
        Lines = Lines.strip() #Remove return to line at the end
        Lines = re.sub("\s+", "\t", Lines) #Replace multiple spaces by a tabulation
        Lines = re.sub(";", "", Lines) #Replace ; by nothing ("")
        Lines = re.sub('"', "", Lines) #Replace " by nothing ("")

        if re.findall(("start_codon|stop_codon"), Lines): #sorting using "|" as "or"
            Lines = Lines.split("\t") #Convert string to list
            IndexToKeep = [2, 3, 4, 9] #List of index to keep
            Lines = [Index for Index in Lines if Lines.index(Index) in IndexToKeep]
            if "start_codon" in Lines:
                StartLine = Lines
                print(StartLine)
            else:
                StopLine = Lines
                print(StopLine)
I have several questions to solve my challenge:

- Am I right in my process and converting them as lists or should I use a different approach?

- With my small script, I would like to combine the successive lists "start_codon" with "stop_codon" or "stop_codon" with "start_codon". I would like to merge them in the order they appear in the file because it would give me the orientation and data with be more easy to analyse after that. I could not find any approach to merge, two outputs obtained each one from a different if statement. What would be the best solution?

- After generating this unique line, am I right if I plan to run several if statements using value index to extract data and generate a summary .csv file?

Thank you in advance.
Reply


Messages In This Thread
Help with output from if statement - by fgaascht - Jan-17-2022, 09:00 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
Question Common understanding of output processing with conditional statement neail 6 943 Sep-17-2023, 03:58 PM
Last Post: neail
  Calculation using output of if statement StephenBeckman 3 2,047 Feb-07-2020, 10:19 PM
Last Post: jefsummers
  Passing print output into another print statement Pleiades 6 3,221 Sep-08-2019, 02:37 PM
Last Post: Pleiades
  Unexpected output: if statement CabbageMan 1 1,803 Sep-04-2019, 04:12 PM
Last Post: ThomasL
  Trying to code backwords in if statement for different output in some scenarios skrivver99 1 2,473 Dec-03-2018, 01:32 AM
Last Post: Windspar

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020