Python Forum
Problem with scraping the Title from a web page
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem with scraping the Title from a web page
#1
Hi. I'm using python 3.10 with Windows.

I have a script like this


    src_url = get_url_by_pagenumber(page)
    res = requests.get(src_url)
    soup = BeautifulSoup(res.text,'lxml')
    internal_pages = soup.select('h4.title a')
    records = []
    for internal_page in internal_pages:
      page_url = get_url_by_href(internal_page['href'])
      s.title = sanitize_filename(internal_page['title'])
      headers = {
        'user-agent':'*user used*'
      }
      res2 = requests.get(page_url,headers=headers)
      soup2 = BeautifulSoup(res2.text,'lxml')
      try:
        s.title = sanitize_filename(internal_page['title'])
I didn't attached the whole script just the interested part.
What I want to do with this script is to insert the the url of an homepage and then the script visit all the internal pages and extract the title of the webpage.
The problem is, if I run the script as it is, I get a key error for the Title tag. This is the full error

Error:
Traceback (most recent call last): File "C:\Users\Administrator\Desktop\script.py", line 175, in <module> s.title = internal_page['title'] File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\bs4\element.py", line 1486, in __getitem__ return self.attrs[key] KeyError: 'title'
If I try to replace (internal_page['title']) with (internal_page['href']) then the whole script is working fine. Any idea why title gives me this error?

EDIT: I forgot to add that title is present in the html but it is under all the meta property and not at the top of the page as usual.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Problem with Web Cursor and Chrome Extension Interaction During Web Scraping ScraperHelge 0 384 Mar-26-2025, 03:47 PM
Last Post: ScraperHelge
  web scraping problem jacksfrustration 1 624 May-30-2024, 04:22 PM
Last Post: Pedroski55
  Take data from web page problem codeweak 5 1,941 Nov-01-2023, 12:29 AM
Last Post: codeweak
  Python SSL web page scraping Vadanane 1 1,833 Jan-13-2023, 04:11 PM
Last Post: snippsat
Brick Javascript based web page scraping amjadraza26 1 2,017 Oct-21-2021, 09:36 AM
Last Post: Larz60+
  scraping a table from an http page vchealy 1 2,191 Jun-10-2021, 09:48 AM
Last Post: Larz60+
  How to change font size of chart title and axis title ? thrupass 5 18,156 Mar-30-2018, 04:02 PM
Last Post: DrFunn1

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020