Aug-02-2019, 11:25 PM
I'm trying to list all titles from a specific Wikipedia page
for some reason when i apply the .get function on the Beautifulsoup object to get all the 'id's, it returns None.
this is my code:
i also searched online and tried to change the code a little and it doesn't seem to change anything.
Python 3.7
Windows 10
Pycharm community edition 2019.2
BeautifulSoup4
requests
for some reason when i apply the .get function on the Beautifulsoup object to get all the 'id's, it returns None.
this is my code:
import requests from bs4 import BeautifulSoup def spider(max_pages): page = 1 while page <= max_pages: main_page = 'https://wikipedia.org/wiki/' search = input("Enter your search: ") page_to_search = main_page + str(search) source_code = requests.get(page_to_search) plain_text = source_code.text soup = BeautifulSoup(plain_text, features="html.parser") for title in soup.findAll('h2'): print(title) ids = link.get('id') print(ids) page += 1 spider(1)this is the output:
Output:Enter your search: >? train
<h2>Contents</h2>
None
<h2><span class="mw-headline" id="Types">Types</span></h2>
None
<h2><span class="mw-headline" id="Bogies">Bogies</span></h2>
None
<h2><span class="mw-headline" id="Motive_power">Motive power</span></h2>
None
<h2><span class="mw-headline" id="Passenger_trains">Passenger trains</span></h2>
None
<h2><span class="mw-headline" id="Freight_trains">Freight trains</span></h2>
None
<h2><span class="mw-headline" id="See_also">See also</span></h2>
None
<h2><span class="mw-headline" id="References">References</span></h2>
None
<h2><span class="mw-headline" id="Further_reading">Further reading</span></h2>
None
<h2><span class="mw-headline" id="External_links">External links</span></h2>
None
<h2>Navigation menu</h2>
None
I've tried to get the links instead and it works the same way,i also searched online and tried to change the code a little and it doesn't seem to change anything.
Python 3.7
Windows 10
Pycharm community edition 2019.2
BeautifulSoup4
requests