Python Forum
Split and organize my Pandas Dataframe
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Split and organize my Pandas Dataframe
#1
Star 
Hello guys,

I'm using the following code in order to collect the cancelled flights from some airlines:
import pandas as pd

url_ini = 'https://flightaware.com/live/fleet/'
url_fim = '/cancelled'
    
ncia = ['SKU', 'TAM']

for i in range(len(ncia)):
    url = url_ini + str(ncia[i]) + url_fim
    df = pd.read_html(url)[3]
        
    df.to_csv(r'C:\Users\bruno\Desktop\Teste.txt', sep=';',  mode='a', header=None, encoding = 'utf-8', index=False)
    print('Atualizado apra a ' + str(ncia[i]) )
And the output is something like this:
Output:
SKU326;A320;Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Int'l Diego Aracena (IQQ / SCDA);Sex 17:43 -04;;; SKU250;A320;Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Int'l El Loa (CJC / SCCF);Sex 18:29 -04;;; SKU801;A320;Jorge Chávez Int'l (LIM / SPJC);Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Sex 17:44 -05;;; SKU329;A320;Int'l Diego Aracena (IQQ / SCDA);Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Sex 20:47 -04;;; SKU253;A320;Int'l El Loa (CJC / SCCF);Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Sex 21:11 -04;;; SKU433;A320;Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);El Tepual Int'l (PMC / SCTE);Sáb 10:55 -04;;;
But I would like to better organize my data and split the content from column 2 and 3, ("Int'l Comodoro Arturo Merino Benítez (SCL / SCEL)" and "Int'l Diego Aracena (IQQ / SCDA)") to just SCL IQQ. So basically I just need the airport code, I don't need the airport name.

My best output would be something like this:
Output:
SKU326;A320;SCL;IQQ;Sex 17:43 -04;;; SKU250;A320;SCL;CJC;Sex 18:29 -04;;; SKU801;A320;LIM;SCL;Sex 17:44 -05;;; SKU329;A320;IQQ);SCL;Sex 20:47 -04;;; SKU253;A320;CJC;SCL;Sex 21:11 -04;;; SKU433;A320;SCL;PMC;Sáb 10:55 -04;;;
How can I do that?

Thank you guys.
Reply
#2
Hope this helps

import pandas as pd
 
url_ini = 'https://flightaware.com/live/fleet/'
url_fim = '/cancelled'
     
ncia = ['SKU', 'TAM']
 
for i in range(len(ncia)):
    url = url_ini + str(ncia[i]) + url_fim
    df = pd.read_html(url)[3]
    df.columns = df.columns.droplevel(0)
    df['Origin'] = df['Origin'].str.split('(').str[1].str[:3] 
    df['Destination'] = df['Destination'].str.split('(').str[1].str[:3]
         
    df.to_csv(r'C:\Users\Kelum desktop PC\Desktop\Teste.txt', sep=';',  mode='a',  encoding = 'utf-8', index=False)
    print('Atualizado apra a ' + str(ncia[i]) )
Output:
Ident;Type;Origin;Destination;ScheduledDeparture Time;Unnamed: 5_level_1;Unnamed: 6_level_1;Unnamed: 7_level_1 SKU329;A320;IQQ;SCL;Fri 08:47PM -04;;; SKU253;A320;CJC;SCL;Fri 09:11PM -04;;; SKU433;A320;SCL;PMC;Sat 10:55AM -04;;; SKU433;A320;PMC;BBA;Sat 01:42PM -04;;; SKU434;A320;BBA;PMC;Sat 03:40PM -04;;; SKU434;A320;PMC;SCL;Sat 05:19PM -04;;; SKU121;A320;SCL;ZAL;Sun 11:46AM -04;;; SKU122;A320;ZAL;SCL;Sun 02:14PM -04;;; SKU101;A320;SCL;ZOS;Sun 02:57PM -04;;; SKU147;A320;SCL;ZCO;Sun 04:35PM -04;;; SKU102;A320;ZOS;SCL;Sun 05:14PM -04;;; SKU304;A320;SCL;ARI;Sun 05:30PM -04;;; SKU148;A320;ZCO;SCL;Sun 06:30PM -04;;; SKU326;A320;SCL;IQQ;Sun 06:47PM -04;;; SKU163;A320;SCL;CCP;Sun 07:27PM -04;;; Ident;Type;Origin;Destination;ScheduledDeparture Time;Unnamed: 5_level_1;Unnamed: 6_level_1;Unnamed: 7_level_1 TAM3595;A319;SDU;CGH;Sat 07:25AM -03;;; TAM8146;B763;GRU;LIS;Sat 04:50PM -03;;; TAM3585;A319;CGH;SDU;Sat 05:40PM -03;;;
brunolelli likes this post
Reply
#3
Hello!
Thanks per your kindly response...
It's working now!

Would you mind helping me with another issue?
As you can see, the last column shows the "date vale", like "Fri 20:47 -04".

I'm not sure if it's formated as text or date, so, how can I convert this "Fri 20:47 -04" to something like this "16/04/21 20:47"

Is it possible to be done?

Thanks
Reply
#4
Date can not be determine certainty as date is not shown.

Ex. below entries are belong to March month and it could be any Saturday within March month.

TAM3595;A319;SDU;CGH;Sat 07:25AM -03;;;
TAM8146;B763;GRU;LIS;Sat 04:50PM -03;;;
TAM3585;A319;CGH;SDU;Sat 05:40PM -03;;;
brunolelli likes this post
Reply
#5
(Apr-18-2021, 02:56 AM)klllmmm Wrote: Date can not be determine certainty as date is not shown.

Ex. below entries are belong to March month and it could be any Saturday within March month.

TAM3595;A319;SDU;CGH;Sat 07:25AM -03;;;
TAM8146;B763;GRU;LIS;Sat 04:50PM -03;;;
TAM3585;A319;CGH;SDU;Sat 05:40PM -03;;;

Thank you!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How do I organize my simple ftp modules? blobdx7 3 458 Jan-05-2024, 01:23 PM
Last Post: Gribouillis
  Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition sanky1990 0 690 Dec-04-2023, 09:48 PM
Last Post: sanky1990
  Question on pandas.dataframe merging two colums shomikc 4 783 Jun-29-2023, 11:30 AM
Last Post: snippsat
  Pandas AttributeError: 'DataFrame' object has no attribute 'concat' Sameer33 5 5,302 Feb-17-2023, 06:01 PM
Last Post: Sameer33
  help how to get size of pandas dataframe into MB\GB mg24 1 2,231 Jan-28-2023, 01:23 PM
Last Post: snippsat
  pandas dataframe into csv .... exponent issue mg24 10 1,708 Jan-20-2023, 08:15 PM
Last Post: deanhystad
  How to assign a value to pandas dataframe column rows based on a condition klllmmm 0 799 Sep-08-2022, 06:32 AM
Last Post: klllmmm
  How to retrieve records in a DataFrame (Python/Pandas) that contains leading or trail mmunozjr 3 1,696 Sep-05-2022, 11:56 AM
Last Post: Pedroski55
  "Vlookup" in pandas dataframe doug2019 3 1,796 May-09-2022, 01:35 PM
Last Post: snippsat
  How can I organize my code according to output that I want ilknurg 1 1,141 Mar-11-2022, 09:24 AM
Last Post: perfringo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020