Python Forum
BS4 Not Able To Find Text In CSS Comments
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
BS4 Not Able To Find Text In CSS Comments
#1
Random example:

import requests
from bs4 import BeautifulSoup
import re

scrape = requests.get('http://www.seacoastonline.com/news/20171113/lets-not-let-politics-divide-us', headers={"user-agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"})
html = scrape.content
soup = BeautifulSoup(html, 'html.parser')

'''

if you search manually in the source (soup) you can find the string "houzz page"

but when when I use find_all, it returns nothing

'''

# print(soup)

comment_search = soup.body.find_all(string=re.compile("houzz page", re.IGNORECASE))
if len(comment_search) > 0:
    print("houzz found")
else:
    print("houzz not found")
Also is my technique ok for returning the results (if len > 0)?
Reply
#2
try:
soup = BeautifulSoup(html, 'lxml')
styles = soup.find_all('style')
for style in styles:
    # filter out what you want here
    print (style.text, style.next_sibling)
Reply
#3
(Feb-27-2018, 12:20 AM)Larz60+ Wrote: try:
soup = BeautifulSoup(html, 'lxml')
styles = soup.find_all('style')
for style in styles:
    # filter out what you want here
    print (style.text, style.next_sibling)

Can't get it to work.. I always get error:

"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

styles = soup.find_all('style')

for style in styles.find_all(string=re.compile("houzz", re.IGNORECASE)):
    print(style.text)
styles = soup.find_all('style')

for style in styles:
    styles.find_all(string=re.compile("houzz", re.IGNORECASE))
    print(style.text)
Reply
#4
Why are you trying to parse CSS comments?
Can do it like this,first find style tag the write regex for comments.
import requests
from bs4 import BeautifulSoup
import re
from pprint import pprint

scrape = requests.get('http://www.seacoastonline.com/news/20171113/lets-not-let-politics-divide-us')
html = scrape.content
soup = BeautifulSoup(html, 'lxml')
style = soup.find('style')
css_comments = re.findall(r'\/\*(.*)\*\/', str(style))
pprint(css_comments)
Output:
['houzz page', 'legacy-header', '==== ARTICLE ======', 'story strip article ad', ' cssUpdates branch', ' cssUpdates branch', ' Buzz widget ', ' TERMS OF SERVICE LINK - under viafoura comments submit button ', ' TOUT MID ARTICLE PLAYER ', ' MOBILE article story stack ', ' margin: 0 3vw 0 0; ']
Reply
#5
(Feb-27-2018, 03:16 AM)snippsat Wrote: Why are you trying to parse CSS comments?
Can do it like this,first find style tag the write regex for comments.
import requests
from bs4 import BeautifulSoup
import re
from pprint import pprint

scrape = requests.get('http://www.seacoastonline.com/news/20171113/lets-not-let-politics-divide-us')
html = scrape.content
soup = BeautifulSoup(html, 'lxml')
style = soup.find('style')
css_comments = re.findall(r'\/\*(.*)\*\/', str(style))
pprint(css_comments)
Output:
['houzz page', 'legacy-header', '==== ARTICLE ======', 'story strip article ad', ' cssUpdates branch', ' cssUpdates branch', ' Buzz widget ', ' TERMS OF SERVICE LINK - under viafoura comments submit button ', ' TOUT MID ARTICLE PLAYER ', ' MOBILE article story stack ', ' margin: 0 3vw 0 0; ']

Thanks! This method works for me.

I should have done better to explain what I'm trying to do in my OP (my bad). I just need to scan the entire source code including CSS comments for a keyword (in this case "houzz"), and if it exists take an action.

I had a script that was working for lots of keywords, but since this specific keyword is located in CSS comments it didn't work.

Here's the working code if anyone comes across this thread and needs it:

import requests
from bs4 import BeautifulSoup
import re

scrape = requests.get('http://www.seacoastonline.com/news/20171113/lets-not-let-politics-divide-us', headers={"user-agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"})
html = scrape.content
soup = BeautifulSoup(html, 'lxml')

css_comments = re.findall("houzz", str(soup))

if len(css_comments) > 0:
    print("houzz keyword found")
else:
    print("houzz keyword not found")
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Instagram Bot _ Posting Comments kristianpython 3 3,334 May-23-2020, 12:54 PM
Last Post: kristianpython
  Why doesn't my spider find body text? sigalizer 5 4,346 Oct-30-2019, 11:35 PM
Last Post: sigalizer
  Post comments to Wordpress Blog SergeyLV 1 2,475 Aug-01-2019, 01:38 AM
Last Post: Larz60+
  Form add tree comments with mptt m0ntecr1st0 1 2,507 Feb-23-2019, 01:50 PM
Last Post: m0ntecr1st0
  XML Parsing - Find a specific text (ElementTree) TeraX 3 4,056 Oct-09-2018, 09:06 AM
Last Post: TeraX
  How to find particular text from td tag using bs4 Prince_Bhatia 7 5,897 Sep-24-2018, 08:36 PM
Last Post: nilamo
  Need comments on content/style of my first project league55 2 2,995 Jan-24-2018, 08:20 AM
Last Post: league55
  Detect comments part using web scraping seco 7 4,883 Jan-18-2018, 10:06 PM
Last Post: seco

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020