Python Forum
Extract text from tag content using regular expression
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract text from tag content using regular expression
#5
import requests
from bs4 import BeautifulSoup

url = 'http://www.amazon.com/s?k=9783319988412&ref=nb_sb_noss'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
    'Accept': 'text/html,*/*',
    'Accept-Language': 'en,en-US;q=0.7,en;q=0.3',
    'X-Requested-With': 'XMLHttpRequest',
    'Connection': 'keep-alive'}

resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.text, 'lxml')

# using find
a = soup.find('a', {'class': 'a-link-normal a-text-normal'})
print(a.get('href'))

# using  css selectors
a1 = soup.select('a.a-link-normal.a-text-normal')
print(a1[0].get('href'))
Not sure why you get malformed selector error, but you probably don't want to select using part of the href attribute (would you change the selector for every page?)
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Messages In This Thread
RE: Extract text from tag content using regular expression - by buran - Nov-25-2019, 11:57 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Regular Expression rakhmadiev 6 5,397 Aug-21-2023, 01:52 PM
Last Post: Gribouillis
  Extract Href URL and Text From List knight2000 2 9,123 Jul-08-2021, 12:53 PM
Last Post: knight2000
  Selenium extract id text xzozx 1 2,138 Jun-15-2020, 06:32 AM
Last Post: Larz60+
  BeautifulSoup : how to have a html5 attribut searched for in a regular expression ? arbiel 2 2,638 May-09-2020, 03:05 PM
Last Post: arbiel
  Extract text between bold headlines from HTML CostasG 1 2,346 Aug-31-2019, 10:53 AM
Last Post: snippsat
  Extract Anchor Text (Scrapy) soothsayerpg 2 8,351 Jul-21-2018, 07:18 AM
Last Post: soothsayerpg
  webscraping - failing to extract specific text from data.gov rontar 2 3,212 May-19-2018, 08:01 AM
Last Post: rontar
  web scraping with python regular expression dbpython2017 6 9,258 Sep-26-2017, 02:16 AM
Last Post: dbpython2017

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020