Aug-02-2019, 11:25 PM
I'm trying to list all titles from a specific Wikipedia page
for some reason when i apply the .get function on the Beautifulsoup object to get all the 'id's, it returns None.
this is my code:
this is the output:
i also searched online and tried to change the code a little and it doesn't seem to change anything.
Python 3.7
Windows 10
Pycharm community edition 2019.2
BeautifulSoup4
requests
for some reason when i apply the .get function on the Beautifulsoup object to get all the 'id's, it returns None.
this is my code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
import requests from bs4 import BeautifulSoup def spider(max_pages): page = 1 while page < = max_pages: search = input ( "Enter your search: " ) page_to_search = main_page + str (search) source_code = requests.get(page_to_search) plain_text = source_code.text soup = BeautifulSoup(plain_text, features = "html.parser" ) for title in soup.findAll( 'h2' ): print (title) ids = link.get( 'id' ) print (ids) page + = 1 spider( 1 ) |
Output:Enter your search: >? train
<h2>Contents</h2>
None
<h2><span class="mw-headline" id="Types">Types</span></h2>
None
<h2><span class="mw-headline" id="Bogies">Bogies</span></h2>
None
<h2><span class="mw-headline" id="Motive_power">Motive power</span></h2>
None
<h2><span class="mw-headline" id="Passenger_trains">Passenger trains</span></h2>
None
<h2><span class="mw-headline" id="Freight_trains">Freight trains</span></h2>
None
<h2><span class="mw-headline" id="See_also">See also</span></h2>
None
<h2><span class="mw-headline" id="References">References</span></h2>
None
<h2><span class="mw-headline" id="Further_reading">Further reading</span></h2>
None
<h2><span class="mw-headline" id="External_links">External links</span></h2>
None
<h2>Navigation menu</h2>
None
I've tried to get the links instead and it works the same way,i also searched online and tried to change the code a little and it doesn't seem to change anything.
Python 3.7
Windows 10
Pycharm community edition 2019.2
BeautifulSoup4
requests