Python Forum
Can't get elements by class on this website
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Can't get elements by class on this website
#1
Hi Everyone,

I have been trying to scrap this website and I run into some NoneType errors that I'm having a really hard time figuring out the root cause of the error.

Here is my code:

import requests
from bs4 import BeautifulSoup

url = 'https://firstmode.bamboohr.com/jobs/'
page = requests.get(url)
print(page.status_code)
soup = BeautifulSoup(page.content, 'html.parser')

results = soup.find_all(id='resultDiv')
listing = soup.find('div', class_='ResAts__card-content ResAts__listing')
I am able to run everything up to line 9 in my debugger and see that the variable "results" is populated with information from the page. My goal is to pull individual listings from this webpage and after inspecting the page I think the class that I have on line 10 is the one I'm after.

Unfortunately, every time I try to pull that class into a variable so that I can iterate over the listings. It either spits out an error for NoneType, or I get an empty list [].

To take a step back...the core of what I'm trying to do is pull the title of a position, the location of that position, and the type of position (full-time, etc). From this website: First Mode Careers
Looping through the listings until I have captured all the individual job positions.
If people have an idea of how to do that with bs4 that is completely different from what I have currently, I am all ears, I'm just trying to learn as much as possible.

Thanks for any help!
CK
Reply
#2
Turn of JavaScript in browser and see👀 what the result is.
JavaScript,why do i not get all content
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://firstmode.bamboohr.com/jobs/"
browser.get(url)
time.sleep(2)
soup = BeautifulSoup(browser.page_source, 'lxml')
ul_list = soup.select_one('#resultDiv > div > ul')
Test.
>>> first = soup.select_one('#resultDiv > div > ul > li:nth-child(1)')
>>> first
<li class="ResAts__card-content ResAts__listing"> <div class="ResAts__listing-department branded-text">
.....
     
>>> ' '.join(first.text.split())
'Engineering Battery Systems Engineer Seattle Washington Engineering Full-Time'
Reply
#3
Can you explain what you did on line 17 and why you choose to parse by lxml for your soup?

Thanks so much!
CK
Reply
#4
(Mar-09-2021, 04:24 PM)Cknutson575 Wrote: Can you explain what you did on line 17 and why you choose to parse by lxml for your soup?

Thanks so much!
CK

I would suggest to read through snippsat's webscraping tutorials
(part I : https://python-forum.io/Thread-Web-Scraping-part-1)

Pretty well explained, and broken down to the basics for easy udnerstanding :)!
Reply
#5
(Mar-09-2021, 04:24 PM)Cknutson575 Wrote: Can you explain what you did on line 17 and why you choose to parse by lxml for your soup?
Look at link posted over,so lxml is a fast parser and with that parameter will BS use lxml as parser.

Selenium has capacity to parse self,also not give source code to BS for parsing.
Can with Selenium use CSS selector or XPath,here a example with XPath.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://firstmode.bamboohr.com/jobs/"
browser.get(url)
time.sleep(2)
ul_list = browser.find_elements_by_xpath('//*[@id="resultDiv"]/div/ul')
first = ul_list[0].find_elements_by_xpath('//*[@id="resultDiv"]/div/ul/li[1]')
print(first[0].text)
Output:
Engineering Battery Systems Engineer Seattle Washington Engineering Full-Time
Reply
#6
(Mar-09-2021, 07:08 PM)Fre3k Wrote:
(Mar-09-2021, 04:24 PM)Cknutson575 Wrote: Can you explain what you did on line 17 and why you choose to parse by lxml for your soup?

Thanks so much!
CK

I would suggest to read through snippsat's webscraping tutorials
(part I : https://python-forum.io/Thread-Web-Scraping-part-1)

Pretty well explained, and broken down to the basics for easy udnerstanding :)!

OK perfect thanks!
Fre3k likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Selenium Random Elements Id and class AgileAVS 1 3,768 Mar-01-2020, 12:31 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020