Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 get function returns None from Beautifulsoup object
#1
I'm trying to list all titles from a specific Wikipedia page
for some reason when i apply the .get function on the Beautifulsoup object to get all the 'id's, it returns None.

this is my code:
import requests
from bs4 import BeautifulSoup


def spider(max_pages):
    page = 1
    while page <= max_pages:
        main_page = 'https://wikipedia.org/wiki/'
        search = input("Enter your search: ")
        page_to_search = main_page + str(search)
        source_code = requests.get(page_to_search)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, features="html.parser")
        for title in soup.findAll('h2'):
            print(title)
            ids = link.get('id')
            print(ids)
        page += 1


spider(1)

this is the output:
Output:
Enter your search: >? train <h2>Contents</h2> None <h2><span class="mw-headline" id="Types">Types</span></h2> None <h2><span class="mw-headline" id="Bogies">Bogies</span></h2> None <h2><span class="mw-headline" id="Motive_power">Motive power</span></h2> None <h2><span class="mw-headline" id="Passenger_trains">Passenger trains</span></h2> None <h2><span class="mw-headline" id="Freight_trains">Freight trains</span></h2> None <h2><span class="mw-headline" id="See_also">See also</span></h2> None <h2><span class="mw-headline" id="References">References</span></h2> None <h2><span class="mw-headline" id="Further_reading">Further reading</span></h2> None <h2><span class="mw-headline" id="External_links">External links</span></h2> None <h2>Navigation menu</h2> None
I've tried to get the links instead and it works the same way,
i also searched online and tried to change the code a little and it doesn't seem to change anything.

Python 3.7
Windows 10
Pycharm community edition 2019.2
BeautifulSoup4
requests
Quote
#2
As shown, this code will not run.
link is not defined.
what id's are you looking for in what tags?
the only element that you are finding is the title
Quote
#3
i'm trying to make a program that you can input what you want to search in Wikipedia and it prints out all of the titles in the page. the id's is the name of the titles in wikipedia source. the class called "mw-headline"

i also tried this this instead
for title in soup.findAll('h2', {"class": "mw-headline"}):
            print(title)
            ids = title.get('id')
            print(ids)
example for what i want:
Output:
enter your search:train Types Bogies etc...
Quote
#4
I managed to fix the code. here is the new one:
import requests
from bs4 import BeautifulSoup


def spider(max_pages):
    page = 1
    while page <= max_pages:
        main_page = 'https://wikipedia.org/wiki/'
        search = input("Enter your search: ")
        page_to_search = main_page + str(search)
        source_code = requests.get(page_to_search)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, features="html.parser")
        for menu in soup.findAll('span', class_='toctext'):
            print(menu.text)


spider(1)

Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  BeautifulSoup 'NoneType' object has no attribute 'text' bmccollum 9 4,615 Sep-14-2018, 12:56 PM
Last Post: bmccollum

Forum Jump:


Users browsing this thread: 1 Guest(s)