Sep-30-2019, 08:21 PM
span.text::text
tries selecting the text of a span
with class text
.Such an element doesn't exist, the text is placed directly in the div.
A css selector that would work here would be simply
::text
.This technically selects all the text nodes inside the div (including the author), but
.extract_first()
will give you only the thing you are after.An alternative is using an xpath such as
./text()
.A couple of non-selector-related notes:
- Your
allowed_domains
is being ignored since it contains full urls instead of domains (it's optional, so your code still works)
- You should use
.get()
instead of.extract_first()
, that's been the recommended api for a while now