Python Forum

Full Version: Split and organize my Pandas Dataframe
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello guys,

I'm using the following code in order to collect the cancelled flights from some airlines:
import pandas as pd

url_ini = 'https://flightaware.com/live/fleet/'
url_fim = '/cancelled'
    
ncia = ['SKU', 'TAM']

for i in range(len(ncia)):
    url = url_ini + str(ncia[i]) + url_fim
    df = pd.read_html(url)[3]
        
    df.to_csv(r'C:\Users\bruno\Desktop\Teste.txt', sep=';',  mode='a', header=None, encoding = 'utf-8', index=False)
    print('Atualizado apra a ' + str(ncia[i]) )
And the output is something like this:
Output:
SKU326;A320;Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Int'l Diego Aracena (IQQ / SCDA);Sex 17:43 -04;;; SKU250;A320;Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Int'l El Loa (CJC / SCCF);Sex 18:29 -04;;; SKU801;A320;Jorge Chávez Int'l (LIM / SPJC);Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Sex 17:44 -05;;; SKU329;A320;Int'l Diego Aracena (IQQ / SCDA);Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Sex 20:47 -04;;; SKU253;A320;Int'l El Loa (CJC / SCCF);Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);Sex 21:11 -04;;; SKU433;A320;Int'l Comodoro Arturo Merino Benítez (SCL / SCEL);El Tepual Int'l (PMC / SCTE);Sáb 10:55 -04;;;
But I would like to better organize my data and split the content from column 2 and 3, ("Int'l Comodoro Arturo Merino Benítez (SCL / SCEL)" and "Int'l Diego Aracena (IQQ / SCDA)") to just SCL IQQ. So basically I just need the airport code, I don't need the airport name.

My best output would be something like this:
Output:
SKU326;A320;SCL;IQQ;Sex 17:43 -04;;; SKU250;A320;SCL;CJC;Sex 18:29 -04;;; SKU801;A320;LIM;SCL;Sex 17:44 -05;;; SKU329;A320;IQQ);SCL;Sex 20:47 -04;;; SKU253;A320;CJC;SCL;Sex 21:11 -04;;; SKU433;A320;SCL;PMC;Sáb 10:55 -04;;;
How can I do that?

Thank you guys.
Hope this helps

import pandas as pd
 
url_ini = 'https://flightaware.com/live/fleet/'
url_fim = '/cancelled'
     
ncia = ['SKU', 'TAM']
 
for i in range(len(ncia)):
    url = url_ini + str(ncia[i]) + url_fim
    df = pd.read_html(url)[3]
    df.columns = df.columns.droplevel(0)
    df['Origin'] = df['Origin'].str.split('(').str[1].str[:3] 
    df['Destination'] = df['Destination'].str.split('(').str[1].str[:3]
         
    df.to_csv(r'C:\Users\Kelum desktop PC\Desktop\Teste.txt', sep=';',  mode='a',  encoding = 'utf-8', index=False)
    print('Atualizado apra a ' + str(ncia[i]) )
Output:
Ident;Type;Origin;Destination;ScheduledDeparture Time;Unnamed: 5_level_1;Unnamed: 6_level_1;Unnamed: 7_level_1 SKU329;A320;IQQ;SCL;Fri 08:47PM -04;;; SKU253;A320;CJC;SCL;Fri 09:11PM -04;;; SKU433;A320;SCL;PMC;Sat 10:55AM -04;;; SKU433;A320;PMC;BBA;Sat 01:42PM -04;;; SKU434;A320;BBA;PMC;Sat 03:40PM -04;;; SKU434;A320;PMC;SCL;Sat 05:19PM -04;;; SKU121;A320;SCL;ZAL;Sun 11:46AM -04;;; SKU122;A320;ZAL;SCL;Sun 02:14PM -04;;; SKU101;A320;SCL;ZOS;Sun 02:57PM -04;;; SKU147;A320;SCL;ZCO;Sun 04:35PM -04;;; SKU102;A320;ZOS;SCL;Sun 05:14PM -04;;; SKU304;A320;SCL;ARI;Sun 05:30PM -04;;; SKU148;A320;ZCO;SCL;Sun 06:30PM -04;;; SKU326;A320;SCL;IQQ;Sun 06:47PM -04;;; SKU163;A320;SCL;CCP;Sun 07:27PM -04;;; Ident;Type;Origin;Destination;ScheduledDeparture Time;Unnamed: 5_level_1;Unnamed: 6_level_1;Unnamed: 7_level_1 TAM3595;A319;SDU;CGH;Sat 07:25AM -03;;; TAM8146;B763;GRU;LIS;Sat 04:50PM -03;;; TAM3585;A319;CGH;SDU;Sat 05:40PM -03;;;
Hello!
Thanks per your kindly response...
It's working now!

Would you mind helping me with another issue?
As you can see, the last column shows the "date vale", like "Fri 20:47 -04".

I'm not sure if it's formated as text or date, so, how can I convert this "Fri 20:47 -04" to something like this "16/04/21 20:47"

Is it possible to be done?

Thanks
Date can not be determine certainty as date is not shown.

Ex. below entries are belong to March month and it could be any Saturday within March month.

TAM3595;A319;SDU;CGH;Sat 07:25AM -03;;;
TAM8146;B763;GRU;LIS;Sat 04:50PM -03;;;
TAM3585;A319;CGH;SDU;Sat 05:40PM -03;;;
(Apr-18-2021, 02:56 AM)klllmmm Wrote: [ -> ]Date can not be determine certainty as date is not shown.

Ex. below entries are belong to March month and it could be any Saturday within March month.

TAM3595;A319;SDU;CGH;Sat 07:25AM -03;;;
TAM8146;B763;GRU;LIS;Sat 04:50PM -03;;;
TAM3585;A319;CGH;SDU;Sat 05:40PM -03;;;

Thank you!