Extract text from tag content using regular expression

Pavel_47 · (This post was last modified: Nov-25-2019, 10:07 AM by Pavel_47.)

(Nov-22-2019, 08:32 PM)Fre3k Wrote: Hi :)

If this is your entire string then you can do it liek this with re. expr.
import re

string = "<a class="a-link-normal a-text-normal" href="/Cybersecurity-Intelligent-Systems-Reference-Library/dp/3319988417/ref=sr_1_1?keywords=9783319988412&qid=1574431833&sr=8-1">

url = re.search("href=\"(.*)>", string)
print(url.group(1))

#or:
url_ = re.findall("href=\"(.*)>", string)
print(url_)
Here is the regexHelper I use :)
-> RegexHelper

Kr

In my case the searching object isn't string, but BeaurtifulSoup.

(Nov-23-2019, 01:11 AM)buran Wrote: https://python-forum.io/Thread-Learning-...2#pid96702
Also look at @snppsat answer in same thread:
https://python-forum.io/Thread-Learning-...9#pid96789

Thanks. Tried without regexp. Doesn't work.

html = download('http://www.amazon.com/s?k=9783319988412&ref=nb_sb_noss')
bs = BeautifulSoup(html.read(), 'lxml')
link = bs.select('a[href^⁼"/Cybersecurity"]')
print(link)

Here is output:

Output:Traceback (most recent call last):
  File "/home/pavel/python_code/BeautifulSoup_test1.py", line 23, in <module>
    link = bs.select('a[href^⁼"/Cybersecurity"]')
  File "/home/pavel/.local/lib/python3.6/site-packages/bs4/element.py", line 1358, in select
    return soupsieve.select(selector, self, namespaces, limit, **kwargs)
  File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/__init__.py", line 114, in select
    return compile(select, namespaces, flags, **kwargs).select(tag, limit)
  File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/__init__.py", line 63, in compile
    return cp._cached_css_compile(pattern, namespaces, custom, flags)
  File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/css_parser.py", line 206, in _cached_css_compile
    CSSParser(pattern, custom=custom_selectors, flags=flags).process_selectors(),
  File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/css_parser.py", line 1062, in process_selectors
    return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
  File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/css_parser.py", line 911, in parse_selectors
    key, m = next(iselector)
  File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/css_parser.py", line 1055, in selector_iter
    raise SelectorSyntaxError(msg, self.pattern, index)
  File "<string>", line None
soupsieve.util.SelectorSyntaxError: Malformed attribute selector at position 1
  line 1:
a[href^⁼"/Cybersecurity"]

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Regular Expression	rakhmadiev	6	5,399	Aug-21-2023, 01:52 PM Last Post: Gribouillis
	Extract Href URL and Text From List	knight2000	2	9,124	Jul-08-2021, 12:53 PM Last Post: knight2000
	Selenium extract id text	xzozx	1	2,138	Jun-15-2020, 06:32 AM Last Post: Larz60+
	BeautifulSoup : how to have a html5 attribut searched for in a regular expression ?	arbiel	2	2,640	May-09-2020, 03:05 PM Last Post: arbiel
	Extract text between bold headlines from HTML	CostasG	1	2,346	Aug-31-2019, 10:53 AM Last Post: snippsat
	Extract Anchor Text (Scrapy)	soothsayerpg	2	8,351	Jul-21-2018, 07:18 AM Last Post: soothsayerpg
	webscraping - failing to extract specific text from data.gov	rontar	2	3,217	May-19-2018, 08:01 AM Last Post: rontar
	web scraping with python regular expression	dbpython2017	6	9,261	Sep-26-2017, 02:16 AM Last Post: dbpython2017

Extract text from tag content using regular expression

User Panel Messages

Announcements