BS4 Not Able To Find Text In CSS Comments

digitalmatic7 · (This post was last modified: Feb-26-2018, 11:35 PM by digitalmatic7.)

Random example:

import requests
from bs4 import BeautifulSoup
import re

scrape = requests.get('http://www.seacoastonline.com/news/20171113/lets-not-let-politics-divide-us', headers={"user-agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"})
html = scrape.content
soup = BeautifulSoup(html, 'html.parser')

'''

if you search manually in the source (soup) you can find the string "houzz page"

but when when I use find_all, it returns nothing

'''

# print(soup)

comment_search = soup.body.find_all(string=re.compile("houzz page", re.IGNORECASE))
if len(comment_search) > 0:
    print("houzz found")
else:
    print("houzz not found")

Also is my technique ok for returning the results (if len > 0)?

**Larz60+** · Feb-27-2018, 12:20 AM

try:

soup = BeautifulSoup(html, 'lxml')
styles = soup.find_all('style')
for style in styles:
    # filter out what you want here
    print (style.text, style.next_sibling)

digitalmatic7 · (This post was last modified: Feb-27-2018, 02:02 AM by digitalmatic7.)

(Feb-27-2018, 12:20 AM)Larz60+ Wrote: try:

soup = BeautifulSoup(html, 'lxml')
styles = soup.find_all('style')
for style in styles:
    # filter out what you want here
    print (style.text, style.next_sibling)

Can't get it to work.. I always get error:

"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

styles = soup.find_all('style')

for style in styles.find_all(string=re.compile("houzz", re.IGNORECASE)):
    print(style.text)

styles = soup.find_all('style')

for style in styles:
    styles.find_all(string=re.compile("houzz", re.IGNORECASE))
    print(style.text)

***snippsat*** · (This post was last modified: Feb-27-2018, 03:16 AM by snippsat.)

Why are you trying to parse CSS comments?
Can do it like this,first find style tag the write regex for comments.

import requests
from bs4 import BeautifulSoup
import re
from pprint import pprint

scrape = requests.get('http://www.seacoastonline.com/news/20171113/lets-not-let-politics-divide-us')
html = scrape.content
soup = BeautifulSoup(html, 'lxml')
style = soup.find('style')
css_comments = re.findall(r'\/\*(.*)\*\/', str(style))
pprint(css_comments)

Output:['houzz page',
 'legacy-header',
 '==== ARTICLE ======',
 'story strip article ad',
 ' cssUpdates branch',
 ' cssUpdates branch',
 ' Buzz widget ',
 '  TERMS OF SERVICE LINK - under viafoura comments submit button ',
 ' TOUT MID ARTICLE PLAYER ',
 ' MOBILE article story stack ',
 ' margin: 0 3vw 0 0; ']

digitalmatic7 · (This post was last modified: Feb-27-2018, 03:45 AM by digitalmatic7.)

(Feb-27-2018, 03:16 AM)snippsat Wrote: Why are you trying to parse CSS comments?
Can do it like this,first find style tag the write regex for comments.

import requests
from bs4 import BeautifulSoup
import re
from pprint import pprint

scrape = requests.get('http://www.seacoastonline.com/news/20171113/lets-not-let-politics-divide-us')
html = scrape.content
soup = BeautifulSoup(html, 'lxml')
style = soup.find('style')
css_comments = re.findall(r'\/\*(.*)\*\/', str(style))
pprint(css_comments)

Output:['houzz page',
 'legacy-header',
 '==== ARTICLE ======',
 'story strip article ad',
 ' cssUpdates branch',
 ' cssUpdates branch',
 ' Buzz widget ',
 '  TERMS OF SERVICE LINK - under viafoura comments submit button ',
 ' TOUT MID ARTICLE PLAYER ',
 ' MOBILE article story stack ',
 ' margin: 0 3vw 0 0; ']

Thanks! This method works for me.

I should have done better to explain what I'm trying to do in my OP (my bad). I just need to scan the entire source code including CSS comments for a keyword (in this case "houzz"), and if it exists take an action.

I had a script that was working for lots of keywords, but since this specific keyword is located in CSS comments it didn't work.

Here's the working code if anyone comes across this thread and needs it:

import requests
from bs4 import BeautifulSoup
import re

scrape = requests.get('http://www.seacoastonline.com/news/20171113/lets-not-let-politics-divide-us', headers={"user-agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"})
html = scrape.content
soup = BeautifulSoup(html, 'lxml')

css_comments = re.findall("houzz", str(soup))

if len(css_comments) > 0:
    print("houzz keyword found")
else:
    print("houzz keyword not found")

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Instagram Bot _ Posting Comments	kristianpython	3	4,956	May-23-2020, 12:54 PM Last Post: kristianpython
	Why doesn't my spider find body text?	sigalizer	5	5,325	Oct-30-2019, 11:35 PM Last Post: sigalizer
	Post comments to Wordpress Blog	SergeyLV	1	3,185	Aug-01-2019, 01:38 AM Last Post: Larz60+
	Form add tree comments with mptt	m0ntecr1st0	1	3,088	Feb-23-2019, 01:50 PM Last Post: m0ntecr1st0
	XML Parsing - Find a specific text (ElementTree)	TeraX	3	5,500	Oct-09-2018, 09:06 AM Last Post: TeraX
	How to find particular text from td tag using bs4	Prince_Bhatia	7	7,565	Sep-24-2018, 08:36 PM Last Post: nilamo
	Need comments on content/style of my first project	league55	2	3,645	Jan-24-2018, 08:20 AM Last Post: league55
	Detect comments part using web scraping	seco	7	6,341	Jan-18-2018, 10:06 PM Last Post: seco

BS4 Not Able To Find Text In CSS Comments

User Panel Messages

Announcements