Feb-27-2018, 03:45 AM
(This post was last modified: Feb-27-2018, 03:45 AM by digitalmatic7.)
(Feb-27-2018, 03:16 AM)snippsat Wrote: Why are you trying to parse CSS comments?
Can do it like this,first find style tag the write regex for comments.
1234567891011import
requests
from
bs4
import
BeautifulSoup
import
re
from
pprint
import
pprint
scrape
=
requests.get(
'http://www.seacoastonline.com/news/20171113/lets-not-let-politics-divide-us'
)
html
=
scrape.content
soup
=
BeautifulSoup(html,
'lxml'
)
style
=
soup.find(
'style'
)
css_comments
=
re.findall(r
'\/\*(.*)\*\/'
,
str
(style))
pprint(css_comments)
Output:['houzz page', 'legacy-header', '==== ARTICLE ======', 'story strip article ad', ' cssUpdates branch', ' cssUpdates branch', ' Buzz widget ', ' TERMS OF SERVICE LINK - under viafoura comments submit button ', ' TOUT MID ARTICLE PLAYER ', ' MOBILE article story stack ', ' margin: 0 3vw 0 0; ']
Thanks! This method works for me.
I should have done better to explain what I'm trying to do in my OP (my bad). I just need to scan the entire source code including CSS comments for a keyword (in this case "houzz"), and if it exists take an action.
I had a script that was working for lots of keywords, but since this specific keyword is located in CSS comments it didn't work.
Here's the working code if anyone comes across this thread and needs it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import requests from bs4 import BeautifulSoup import re scrape = requests.get( 'http://www.seacoastonline.com/news/20171113/lets-not-let-politics-divide-us' , headers = { "user-agent" : "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36" }) html = scrape.content soup = BeautifulSoup(html, 'lxml' ) css_comments = re.findall( "houzz" , str (soup)) if len (css_comments) > 0 : print ( "houzz keyword found" ) else : print ( "houzz keyword not found" ) |