Python Forum
Extract text from tag content using regular expression
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract text from tag content using regular expression
#1
Hello,

Here is the tag from where I want to extract text fragment (in bold):
<a class="a-link-normal a-text-normal" href="/Cybersecurity-Intelligent-Systems-Reference-Library/dp/3319988417/ref=sr_1_1?keywords=9783319988412&qid=1574431833&sr=8-1">.

Here is my code:
import urllib.request
from bs4 import BeautifulSoup
import re

def download(url, user_agent='wswp', num_retries=2):
    print('Downloading:', url)
    request = urllib.request.Request(url)
    request.add_header('User-agent', user_agent)
    try:
        html = urllib.request.urlopen(request)
    except (URLError, HTTPError, ContentTooShortError) as e:
        print('Download error:', e.reason)
        html = None
        if num_retries > 0:
            if hasattr(e, 'code') and 500 <= e.code < 600:
            # recursively retry 5xx HTTP errors
                return download(url, num_retries - 1)
    return html

html = download('http://www.amazon.com/s?k=9783319988412&ref=nb_sb_noss')
bs = BeautifulSoup(html.read(), 'lxml')
nameList = bs.find_all('a', {'href':re.compile('"/.*(keywords).*')})
print(len(nameList))
for name in nameList:
    print(name.get_text())
Doesn't work.
Any suggestions.
Thanks.
Reply


Messages In This Thread
Extract text from tag content using regular expression - by Pavel_47 - Nov-22-2019, 02:20 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Regular Expression rakhmadiev 6 5,301 Aug-21-2023, 01:52 PM
Last Post: Gribouillis
  Extract Href URL and Text From List knight2000 2 8,625 Jul-08-2021, 12:53 PM
Last Post: knight2000
  Selenium extract id text xzozx 1 2,077 Jun-15-2020, 06:32 AM
Last Post: Larz60+
  BeautifulSoup : how to have a html5 attribut searched for in a regular expression ? arbiel 2 2,576 May-09-2020, 03:05 PM
Last Post: arbiel
  Extract text between bold headlines from HTML CostasG 1 2,273 Aug-31-2019, 10:53 AM
Last Post: snippsat
  Extract Anchor Text (Scrapy) soothsayerpg 2 8,254 Jul-21-2018, 07:18 AM
Last Post: soothsayerpg
  webscraping - failing to extract specific text from data.gov rontar 2 3,138 May-19-2018, 08:01 AM
Last Post: rontar
  web scraping with python regular expression dbpython2017 6 9,139 Sep-26-2017, 02:16 AM
Last Post: dbpython2017

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020