Bottom Page

Thread Rating:
  • 1 Vote(s) - 1 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Extract Anchor Text (Scrapy)
#1
Tried using the search box but didn't find any post that relates. Also tried Googling it but didn't find any answer.

How can I extract only the anchor text in a given hyperlink?

Quote:I.E. <a href='mydomain.com'>my anchor text</a>

Quote:example:
<div class = "blog_next_page">
<a class="next_page" href="mydomain.com/page/2">my anchor text</a>

Called the page/site using
scrapy shell 'website url'
using
response.css('div.blog_next_page > a::attr(href)').extract_first()
I can extract the link but how can i get "my anchor text"?

Many thanks for the help!
Quote
#2
try
response.css('div.blog_next_page > a::text').extract_first()
Scrapy Selectors docs
Quote
#3
(Jul-21-2018, 06:26 AM)buran Wrote: try
response.css('div.blog_next_page > a::text').extract_first()
Scrapy Selectors docs

It works!

I was messing around with having 'text' inside the attr() or a::text(), geez...

So 'text' alone is just inside the string or it sniff a string at the a-tag?

Thanks again!
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Extract text from tag content using regular expression Pavel_47 8 448 Nov-25-2019, 03:17 PM
Last Post: buran
  Extract text between bold headlines from HTML CostasG 1 311 Aug-31-2019, 10:53 AM
Last Post: snippsat
  webscraping - failing to extract specific text from data.gov rontar 2 807 May-19-2018, 08:01 AM
Last Post: rontar
  Scrapy-cut: Advanced Cookiecutter Scrapy Templating scriptso 2 1,781 Feb-02-2017, 07:57 PM
Last Post: scriptso

Forum Jump:


Users browsing this thread: 1 Guest(s)