Python Forum
get function returns None from Beautifulsoup object - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: get function returns None from Beautifulsoup object (/thread-20266.html)



get function returns None from Beautifulsoup object - DeanAseraf1 - Aug-02-2019

I'm trying to list all titles from a specific Wikipedia page
for some reason when i apply the .get function on the Beautifulsoup object to get all the 'id's, it returns None.

this is my code:
import requests
from bs4 import BeautifulSoup


def spider(max_pages):
    page = 1
    while page <= max_pages:
        main_page = 'https://wikipedia.org/wiki/'
        search = input("Enter your search: ")
        page_to_search = main_page + str(search)
        source_code = requests.get(page_to_search)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, features="html.parser")
        for title in soup.findAll('h2'):
            print(title)
            ids = link.get('id')
            print(ids)
        page += 1


spider(1)
this is the output:
Output:
Enter your search: >? train <h2>Contents</h2> None <h2><span class="mw-headline" id="Types">Types</span></h2> None <h2><span class="mw-headline" id="Bogies">Bogies</span></h2> None <h2><span class="mw-headline" id="Motive_power">Motive power</span></h2> None <h2><span class="mw-headline" id="Passenger_trains">Passenger trains</span></h2> None <h2><span class="mw-headline" id="Freight_trains">Freight trains</span></h2> None <h2><span class="mw-headline" id="See_also">See also</span></h2> None <h2><span class="mw-headline" id="References">References</span></h2> None <h2><span class="mw-headline" id="Further_reading">Further reading</span></h2> None <h2><span class="mw-headline" id="External_links">External links</span></h2> None <h2>Navigation menu</h2> None
I've tried to get the links instead and it works the same way,
i also searched online and tried to change the code a little and it doesn't seem to change anything.

Python 3.7
Windows 10
Pycharm community edition 2019.2
BeautifulSoup4
requests


RE: get function returns None from Beautifulsoup object - Larz60+ - Aug-03-2019

As shown, this code will not run.
link is not defined.
what id's are you looking for in what tags?
the only element that you are finding is the title


RE: get function returns None from Beautifulsoup object - DeanAseraf1 - Aug-03-2019

i'm trying to make a program that you can input what you want to search in Wikipedia and it prints out all of the titles in the page. the id's is the name of the titles in wikipedia source. the class called "mw-headline"

i also tried this this instead
for title in soup.findAll('h2', {"class": "mw-headline"}):
            print(title)
            ids = title.get('id')
            print(ids)
example for what i want:
Output:
enter your search:train Types Bogies etc...



RE: get function returns None from Beautifulsoup object - DeanAseraf1 - Aug-03-2019

I managed to fix the code. here is the new one:
import requests
from bs4 import BeautifulSoup


def spider(max_pages):
    page = 1
    while page <= max_pages:
        main_page = 'https://wikipedia.org/wiki/'
        search = input("Enter your search: ")
        page_to_search = main_page + str(search)
        source_code = requests.get(page_to_search)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, features="html.parser")
        for menu in soup.findAll('span', class_='toctext'):
            print(menu.text)


spider(1)