Python Forum
Extract text from tag content using regular expression
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract text from tag content using regular expression
#4
(Nov-22-2019, 08:32 PM)Fre3k Wrote: Hi :)

If this is your entire string then you can do it liek this with re. expr.

import re

string = "<a class="a-link-normal a-text-normal" href="/Cybersecurity-Intelligent-Systems-Reference-Library/dp/3319988417/ref=sr_1_1?keywords=9783319988412&qid=1574431833&sr=8-1">

url = re.search("href=\"(.*)>", string)
print(url.group(1))

#or:
url_ = re.findall("href=\"(.*)>", string)
print(url_)
Here is the regexHelper I use :)
-> RegexHelper

Kr

In my case the searching object isn't string, but BeaurtifulSoup.

(Nov-23-2019, 01:11 AM)buran Wrote: https://python-forum.io/Thread-Learning-...2#pid96702
Also look at @snppsat answer in same thread:
https://python-forum.io/Thread-Learning-...9#pid96789

Thanks. Tried without regexp. Doesn't work.

html = download('http://www.amazon.com/s?k=9783319988412&ref=nb_sb_noss')
bs = BeautifulSoup(html.read(), 'lxml')
link = bs.select('a[href^⁼"/Cybersecurity"]')
print(link)
Here is output:
Output:
Traceback (most recent call last): File "/home/pavel/python_code/BeautifulSoup_test1.py", line 23, in <module> link = bs.select('a[href^⁼"/Cybersecurity"]') File "/home/pavel/.local/lib/python3.6/site-packages/bs4/element.py", line 1358, in select return soupsieve.select(selector, self, namespaces, limit, **kwargs) File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/__init__.py", line 114, in select return compile(select, namespaces, flags, **kwargs).select(tag, limit) File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/__init__.py", line 63, in compile return cp._cached_css_compile(pattern, namespaces, custom, flags) File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/css_parser.py", line 206, in _cached_css_compile CSSParser(pattern, custom=custom_selectors, flags=flags).process_selectors(), File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/css_parser.py", line 1062, in process_selectors return self.parse_selectors(self.selector_iter(self.pattern), index, flags) File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/css_parser.py", line 911, in parse_selectors key, m = next(iselector) File "/home/pavel/.local/lib/python3.6/site-packages/soupsieve/css_parser.py", line 1055, in selector_iter raise SelectorSyntaxError(msg, self.pattern, index) File "<string>", line None soupsieve.util.SelectorSyntaxError: Malformed attribute selector at position 1 line 1: a[href^⁼"/Cybersecurity"]
Reply


Messages In This Thread
RE: Extract text from tag content using regular expression - by Pavel_47 - Nov-25-2019, 09:49 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Regular Expression rakhmadiev 6 5,399 Aug-21-2023, 01:52 PM
Last Post: Gribouillis
  Extract Href URL and Text From List knight2000 2 9,124 Jul-08-2021, 12:53 PM
Last Post: knight2000
  Selenium extract id text xzozx 1 2,138 Jun-15-2020, 06:32 AM
Last Post: Larz60+
  BeautifulSoup : how to have a html5 attribut searched for in a regular expression ? arbiel 2 2,640 May-09-2020, 03:05 PM
Last Post: arbiel
  Extract text between bold headlines from HTML CostasG 1 2,346 Aug-31-2019, 10:53 AM
Last Post: snippsat
  Extract Anchor Text (Scrapy) soothsayerpg 2 8,351 Jul-21-2018, 07:18 AM
Last Post: soothsayerpg
  webscraping - failing to extract specific text from data.gov rontar 2 3,217 May-19-2018, 08:01 AM
Last Post: rontar
  web scraping with python regular expression dbpython2017 6 9,261 Sep-26-2017, 02:16 AM
Last Post: dbpython2017

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020