Python using BS scraper - Printable Version

Python using BS scraper - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Python using BS scraper (/thread-24300.html)

Python using BS scraper - paulfearn100 - Feb-07-2020

Hello please can some point me in the right direction
- i have been dabbling and leaning Python with Beautiful soup and web scraping
i would like to create a program that can extract multiple web links eg.
<a href="/car_racing/" class="win">1pm</a>
<a href="/car_racing2/" class="win">2pm</a>
<a href="/car_racing3/" class="win">3pm</a>
<a href="/car_racing4/" class="win">4pm</a>

store these either in a json or csv or ??(please advise the best storage to use)
then add the main link on the this (www.carracing.com/car_racing/profile/ open each link and extract another link

<a href="/profile/" class="win">red</a>
<a href="/profile1/" class="win">blue</a>
<a href="/profile2/" class="win">green</a>
<a href="/profile3/" class="win">white</a>

again store these store these
then open each link and extract the data per car driver name, car type, car engine, car make ect

then present the date in a readable format

RE: Python using BS scraper - snippsat - Feb-07-2020

Look at web-scraping part-1, part-2

from bs4 import BeautifulSoup

html = '''\
<a href="/car_racing/" class="win">1pm</a>
<a href="/car_racing2/" class="win">2pm</a>
<a href="/car_racing3/" class="win">3pm</a>
<a href="/car_racing4/" class="win">4pm</a>'''

soup = BeautifulSoup(html, 'lxml')

Usage:

>>> all_a = soup.find_all('a', class_="win")
>>> all_a
[<a class="win" href="/car_racing/">1pm</a>,
 <a class="win" href="/car_racing2/">2pm</a>,
 <a class="win" href="/car_racing3/">3pm</a>,
 <a class="win" href="/car_racing4/">4pm</a>]

>>> for tag in all_a:
...     print(tag.text)
...     
1pm
2pm
3pm
4pm