Python Forum
Trying to Tabulate Information from an Aircraft Website Link(s) - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Trying to Tabulate Information from an Aircraft Website Link(s) (/thread-19158.html)

Pages: 1 2 3 4


RE: Trying to Tabulate Information from an Aircraft Website Link(s) - snippsat - Jun-17-2019

You have to mouse and drag over code in cell,when all is blue copy it.
There is also View as Code button at top,this show code in clear text.


RE: Trying to Tabulate Information from an Aircraft Website Link(s) - eddywinch82 - Jun-17-2019

Quote:You have to mouse and drag over code in cell,when all is blue copy it.

Yes that is what I did, and it doesn't work. I suppose I may have to type, all the Code out ?


RE: Trying to Tabulate Information from an Aircraft Website Link(s) - snippsat - Jun-17-2019

Quote:Yes that is what I did, and it doesn't work. I suppose I may have to type, all the Code out ?
What doesn't work,this is just a standard copy of text,an no you shall not type any code at all.
You mark so code so is blue then Ctrl+c or right click mouse over text an copy.
Then Ctrl+v or right mouse paste text.
This is basic commands that work everywhere Wall
If you want only text in browser click on View as Code button on top.
Also there is standard copy and paste of text/code.


RE: Trying to Tabulate Information from an Aircraft Website Link(s) - eddywinch82 - Jun-17-2019

Hi snippsat,

Copying text, then Holding down Ctrl and V, worked,

Many thanks Smile

Eddie

Hi snippsat,

I have got the jist, of what I needed to do :-

​import pandas as pd
import requests
from bs4 import BeautifulSoup
#from tabulate import tabulate

res = requests.get("http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/june07.html")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]

df = pd.read_html(str(table))
#print( tabulate(df[0], headers='keys', tablefmt='psql') )

# Clean up,put index(Date location...) at top,delete 2 first row
df = df[1]
df = df.rename(columns=df.iloc[0])
df = df.iloc[2:]
df.head(15)

# Lydd - Display. And that only, had the Spitfire Hurricane and Dakota booked
# Here Lydd -- Spitfire Hurricane,there where none where all where booked
Southport = df[(df['Dakota'] == "D") & (df['Spitfire'] == 'S') & (df['Hurricane'] == 'H')]
Southport
So I got the data from the month of June for example, showing all Spitfire Hurricane and Dakota only, no other combinations of booked appearances. Just wondering how do I have, only events with - Display next to them showing ? and have all the Dates showing for the events, most say NaN next to them ?

Eddie


RE: Trying to Tabulate Information from an Aircraft Website Link(s) - eddywinch82 - Jun-18-2019

Tried adding on to the end :-

​# Lydd - Display. And that only, had the Spitfire Hurricane and Dakota booked
# Here Lydd -- Spitfire Hurricane,there where none where all where booked
Southport = df[(df['Dakota'] == "D") & (df['Spitfire'] == 'S') & (df['Hurricane'] == 'H'].str.contains("- Display")]
Southport
So that only Locations, with - Display next to them show. Didn't work when I ran the Code though "Invalid Syntax".


I have sorted part of the Code, to only show the Displays, here is the end part of the Code :-

Southport = df[df['Location'].str.contains('- Display') & (df['Dakota'] == "D") & (df['Spitfire'] == 'S') & (df['Hurricane'] == 'H')]  
Southport



RE: Trying to Tabulate Information from an Aircraft Website Link(s) - eddywinch82 - Jun-18-2019

I Want the table not to show, LSHD, i.e. Lancaster Spitfire Hurricane and Dakota, i.e. Locations that have all 4 booked. So only SHD Displays are shown in the table, here is the end part, of the Code. So when the Lancaster Column value, is NaN i.e. nothing :-

Southport = df[df['Location'].str.contains('- Display') & df[df['Lancaster'].str.contains('') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'S') & (df['Hurricane'] == 'H')] 
Southport          
But I get the following Traceback Error :-

Error:
File "<ipython-input-1-d313402a8b64>", line 23 Southport ^ SyntaxError: invalid syntax
Where have I gone wrong ?

Eddie


RE: Trying to Tabulate Information from an Aircraft Website Link(s) - eddywinch82 - Jun-18-2019

I found the following part of a code :-

urlbase = "https://www.olx.in/coimbatore/?&page="

for x in range (4)[1:]:
 res = requests.get(urlbase + str(x))
How can I adapt that Code, so requests will go through all links, to produce the necessary Data, i.e. display all SHD Bookings only, for the Whole year ?

I have :-

res = requests.get("http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/september07.html")
And it is the end bit, that is the only part that is different, in each Url. There are 7 months i.e. March to September, and the Url differs only by the following ending, i.e. march07.html april07.html may07.html etc.

Eddie


RE: Trying to Tabulate Information from an Aircraft Website Link(s) - eddywinch82 - Jun-18-2019

I also found the following Code, works to delete the Lancaster Column :-

del df['Lancaster']



RE: Trying to Tabulate Information from an Aircraft Website Link(s) - eddywinch82 - Jun-19-2019

Hi, can anyone help me ?


RE: Trying to Tabulate Information from an Aircraft Website Link(s) - snippsat - Jun-19-2019

Hi i have been busy lately and do have much time for answers.
Can take a little now.
(Jun-18-2019, 01:14 PM)eddywinch82 Wrote: But I get the following Traceback Error :-
Have to be careful with () [] count when have so long line.
Southport = df[(df['Location'].str.contains('- Display')) & (df['Lancaster'].str.contains('L')) | (df['Dakota'] == 'D') & (df['Spitfire'] == 'S') & (df['Hurricane'] == 'H')] 
Southport
I did tow in a |(mean or) to get some matches.
&(means and)

Quote:There are 7 months i.e. March to September, and the Url differs only by the following ending, i.e. march07.html april07.html may07.html etc.
Make months.
>>> import calendar
>>> 
>>> months = filter(None, calendar.month_name)
>>> months = list(months) 
>>> months
['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']

>>> months[2:5]
['March', 'April', 'May'
Use in url.
import pandas as pd
import requests
from bs4 import BeautifulSoup
import calendar

months = filter(None, calendar.month_name)
months = list(months)

new_table = []
for month in months[2:5]:
    res = requests.get(f"http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/{month}07.html")
    soup = BeautifulSoup(res.content,'lxml')
    table = soup.find_all('table')[0]
    new_table.append(soup.find_all('table')[0])

print(new_table)
In Pandas eg May month.
import pandas as pd
import requests
from bs4 import BeautifulSoup
import calendar

months = filter(None, calendar.month_name)
months = list(months)

new_table = []
for month in months[2:5]:
    res = requests.get(f"http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/{month}07.html")
    soup = BeautifulSoup(res.content,'lxml')
    table = soup.find_all('table')[0]
    new_table.append(soup.find_all('table')[0])

df = pd.read_html(str(new_table[2]))

Quote:How can I adapt that Code, so requests will go through all links, to produce the necessary Data, i.e. display all SHD Bookings only, for the Whole year ?
Same way as i showed you over with that other url,and as this is a new task you should try on or own or make a new Thread.
These task(not the easiest) you will get stuck a lot,when have missing basic Python knowledge Wink