Python Forum
Split and organize my Pandas Dataframe
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Split and organize my Pandas Dataframe
#1
Star 
Hello guys,

I'm using the following code in order to collect the cancelled flights from some airlines:
import pandas as pd

url_ini = 'https://flightaware.com/live/fleet/'
url_fim = '/cancelled'
    
ncia = ['SKU', 'TAM']

for i in range(len(ncia)):
    url = url_ini + str(ncia[i]) + url_fim
    df = pd.read_html(url)[3]
        
    df.to_csv(r'C:\Users\bruno\Desktop\Teste.txt', sep=';',  mode='a', header=None, encoding = 'utf-8', index=False)
    print('Atualizado apra a ' + str(ncia[i]) )
And the output is something like this:
Output:
SKU326;A320;Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Int'l Diego Aracena (IQQ / SCDA);Sex 17:43 -04;;; SKU250;A320;Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Int'l El Loa (CJC / SCCF);Sex 18:29 -04;;; SKU801;A320;Jorge Chávez Int'l (LIM / SPJC);Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Sex 17:44 -05;;; SKU329;A320;Int'l Diego Aracena (IQQ / SCDA);Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Sex 20:47 -04;;; SKU253;A320;Int'l El Loa (CJC / SCCF);Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Sex 21:11 -04;;; SKU433;A320;Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);El Tepual Int'l (PMC / SCTE);Sáb 10:55 -04;;;
But I would like to better organize my data and split the content from column 2 and 3, ("Int'l Comodoro Arturo Merino Benítez (SCL / SCEL)" and "Int'l Diego Aracena (IQQ / SCDA)") to just SCL IQQ. So basically I just need the airport code, I don't need the airport name.

My best output would be something like this:
Output:
SKU326;A320;SCL;IQQ;Sex 17:43 -04;;; SKU250;A320;SCL;CJC;Sex 18:29 -04;;; SKU801;A320;LIM;SCL;Sex 17:44 -05;;; SKU329;A320;IQQ);SCL;Sex 20:47 -04;;; SKU253;A320;CJC;SCL;Sex 21:11 -04;;; SKU433;A320;SCL;PMC;Sáb 10:55 -04;;;
How can I do that?

Thank you guys.
Reply
#2
Hope this helps

import pandas as pd
 
url_ini = 'https://flightaware.com/live/fleet/'
url_fim = '/cancelled'
     
ncia = ['SKU', 'TAM']
 
for i in range(len(ncia)):
    url = url_ini + str(ncia[i]) + url_fim
    df = pd.read_html(url)[3]
    df.columns = df.columns.droplevel(0)
    df['Origin'] = df['Origin'].str.split('(').str[1].str[:3] 
    df['Destination'] = df['Destination'].str.split('(').str[1].str[:3]
         
    df.to_csv(r'C:\Users\Kelum desktop PC\Desktop\Teste.txt', sep=';',  mode='a',  encoding = 'utf-8', index=False)
    print('Atualizado apra a ' + str(ncia[i]) )
Output:
Ident;Type;Origin;Destination;ScheduledDeparture Time;Unnamed: 5_level_1;Unnamed: 6_level_1;Unnamed: 7_level_1 SKU329;A320;IQQ;SCL;Fri 08:47PM -04;;; SKU253;A320;CJC;SCL;Fri 09:11PM -04;;; SKU433;A320;SCL;PMC;Sat 10:55AM -04;;; SKU433;A320;PMC;BBA;Sat 01:42PM -04;;; SKU434;A320;BBA;PMC;Sat 03:40PM -04;;; SKU434;A320;PMC;SCL;Sat 05:19PM -04;;; SKU121;A320;SCL;ZAL;Sun 11:46AM -04;;; SKU122;A320;ZAL;SCL;Sun 02:14PM -04;;; SKU101;A320;SCL;ZOS;Sun 02:57PM -04;;; SKU147;A320;SCL;ZCO;Sun 04:35PM -04;;; SKU102;A320;ZOS;SCL;Sun 05:14PM -04;;; SKU304;A320;SCL;ARI;Sun 05:30PM -04;;; SKU148;A320;ZCO;SCL;Sun 06:30PM -04;;; SKU326;A320;SCL;IQQ;Sun 06:47PM -04;;; SKU163;A320;SCL;CCP;Sun 07:27PM -04;;; Ident;Type;Origin;Destination;ScheduledDeparture Time;Unnamed: 5_level_1;Unnamed: 6_level_1;Unnamed: 7_level_1 TAM3595;A319;SDU;CGH;Sat 07:25AM -03;;; TAM8146;B763;GRU;LIS;Sat 04:50PM -03;;; TAM3585;A319;CGH;SDU;Sat 05:40PM -03;;;
brunolelli likes this post
Reply
#3
Hello!
Thanks per your kindly response...
It's working now!

Would you mind helping me with another issue?
As you can see, the last column shows the "date vale", like "Fri 20:47 -04".

I'm not sure if it's formated as text or date, so, how can I convert this "Fri 20:47 -04" to something like this "16/04/21 20:47"

Is it possible to be done?

Thanks
Reply
#4
Date can not be determine certainty as date is not shown.

Ex. below entries are belong to March month and it could be any Saturday within March month.

TAM3595;A319;SDU;CGH;Sat 07:25AM -03;;;
TAM8146;B763;GRU;LIS;Sat 04:50PM -03;;;
TAM3585;A319;CGH;SDU;Sat 05:40PM -03;;;
brunolelli likes this post
Reply
#5
(Apr-18-2021, 02:56 AM)klllmmm Wrote: Date can not be determine certainty as date is not shown.

Ex. below entries are belong to March month and it could be any Saturday within March month.

TAM3595;A319;SDU;CGH;Sat 07:25AM -03;;;
TAM8146;B763;GRU;LIS;Sat 04:50PM -03;;;
TAM3585;A319;CGH;SDU;Sat 05:40PM -03;;;

Thank you!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Partial Matching Rows In Pandas DataFrame Query eddywinch82 1 189 Jul-08-2021, 06:32 PM
Last Post: eddywinch82
  Pandas dictionary dataframe help michaelserra 4 307 Jun-19-2021, 10:26 AM
Last Post: michaelserra
  Pandas DataFrame combine rows by column value, where Date Rows are NULL rhat398 0 332 May-04-2021, 10:51 PM
Last Post: rhat398
  Convert MultiLayer XML to DataFrame using Pandas vsingh17 0 494 Apr-14-2021, 03:50 PM
Last Post: vsingh17
  Pandas: how to split one row of data to multiple rows and columns in Python GerardMoussendo 4 1,169 Feb-22-2021, 06:51 PM
Last Post: eddywinch82
  Pandas DataFrame Code Query eddywinch82 6 1,003 Feb-12-2021, 09:55 PM
Last Post: eddywinch82
  Counting number of words and organize for the bigger frequencies to the small ones. valeriorsneto 1 334 Feb-05-2021, 03:49 PM
Last Post: perfringo
  Pandas dataframe without index tgottsc1 3 1,511 Feb-01-2021, 05:29 PM
Last Post: snippsat
  Json File more pages #pandas #dataframe nio74maz 0 471 Dec-30-2020, 05:32 AM
Last Post: nio74maz
  Pandas Extract data from two dataframe nio74maz 1 471 Dec-26-2020, 09:52 PM
Last Post: nio74maz

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020