Can't get elements by class on this website

Cknutson575 · Mar-09-2021, 08:26 AM

Hi Everyone,

I have been trying to scrap this website and I run into some NoneType errors that I'm having a really hard time figuring out the root cause of the error.

Here is my code:

import requests
from bs4 import BeautifulSoup

url = 'https://firstmode.bamboohr.com/jobs/'
page = requests.get(url)
print(page.status_code)
soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find_all(id='resultDiv')
listing = soup.find('div', class_='ResAts__card-content ResAts__listing')

I am able to run everything up to line 9 in my debugger and see that the variable "results" is populated with information from the page. My goal is to pull individual listings from this webpage and after inspecting the page I think the class that I have on line 10 is the one I'm after.

Unfortunately, every time I try to pull that class into a variable so that I can iterate over the listings. It either spits out an error for NoneType, or I get an empty list [].

To take a step back...the core of what I'm trying to do is pull the title of a position, the location of that position, and the type of position (full-time, etc). From this website: First Mode Careers
Looping through the listings until I have captured all the individual job positions.
If people have an idea of how to do that with bs4 that is completely different from what I have currently, I am all ears, I'm just trying to learn as much as possible.

Thanks for any help!
CK

***snippsat*** · Mar-09-2021, 10:20 AM

Turn of JavaScript in browser and see👀 what the result is.
JavaScript,why do i not get all content

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://firstmode.bamboohr.com/jobs/"
browser.get(url)
time.sleep(2)
soup = BeautifulSoup(browser.page_source, 'lxml')
ul_list = soup.select_one('#resultDiv > div > ul')

Test.

>>> first = soup.select_one('#resultDiv > div > ul > li:nth-child(1)')
>>> first
<li class="ResAts__card-content ResAts__listing"> <div class="ResAts__listing-department branded-text">
.....
     
>>> ' '.join(first.text.split())
'Engineering Battery Systems Engineer Seattle Washington Engineering Full-Time'

Cknutson575 · Mar-09-2021, 04:24 PM

Can you explain what you did on line 17 and why you choose to parse by lxml for your soup?

Thanks so much!
CK

Fre3k · Mar-09-2021, 07:08 PM

(Mar-09-2021, 04:24 PM)Cknutson575 Wrote: Can you explain what you did on line 17 and why you choose to parse by lxml for your soup?

Thanks so much!
CK

I would suggest to read through snippsat's webscraping tutorials
(part I : https://python-forum.io/Thread-Web-Scraping-part-1)

Pretty well explained, and broken down to the basics for easy udnerstanding :)!

***snippsat*** · (This post was last modified: Mar-09-2021, 10:40 PM by snippsat.)

(Mar-09-2021, 04:24 PM)Cknutson575 Wrote: Can you explain what you did on line 17 and why you choose to parse by lxml for your soup?

Look at link posted over,so lxml is a fast parser and with that parameter will BS use lxml as parser.

Selenium has capacity to parse self,also not give source code to BS for parsing.
Can with Selenium use CSS selector or XPath,here a example with XPath.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://firstmode.bamboohr.com/jobs/"
browser.get(url)
time.sleep(2)
ul_list = browser.find_elements_by_xpath('//*[@id="resultDiv"]/div/ul')
first = ul_list[0].find_elements_by_xpath('//*[@id="resultDiv"]/div/ul/li[1]')
print(first[0].text)

Output:Engineering
Battery Systems Engineer
Seattle
Washington
Engineering
Full-Time

Cknutson575 · Mar-10-2021, 07:16 AM

(Mar-09-2021, 07:08 PM)Fre3k Wrote:
(Mar-09-2021, 04:24 PM)Cknutson575 Wrote: Can you explain what you did on line 17 and why you choose to parse by lxml for your soup?

Thanks so much!
CK

I would suggest to read through snippsat's webscraping tutorials
(part I : https://python-forum.io/Thread-Web-Scraping-part-1)

Pretty well explained, and broken down to the basics for easy udnerstanding :)!

OK perfect thanks!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Screenshotting website and highlighting clickable elements	natew	0	1,369	Aug-13-2024, 08:38 PM Last Post: natew
	Selenium Random Elements Id and class	AgileAVS	1	4,737	Mar-01-2020, 12:31 PM Last Post: metulburr

Can't get elements by class on this website

User Panel Messages

Announcements