error installing requests_html

davidm · Mar-04-2020, 05:30 PM

Hi, Trying to run: "pip install requests_html" and getting this error
Executed ommand: pip install libxml2-python3
Error occurred: Non-zero exit code (1)

Running Python 3.8 on Windows 10 and want to use BeautifulSoup for dynamic javascript sites
Any ideas how I can solve this, thanks for any help

***snippsat*** · (This post was last modified: Mar-04-2020, 09:34 PM by snippsat.)

I did try requests_html install for 3.8 win-10,and no error on install(Install tutorial).
I tested it before was not stable then,now also lack updates.
Users in Issue tracker.

Quote:So I'm guessing that this project is abandoned.

Using Selenium and web-driver(Chrome or Firefox) is stable solution that works for most cases.
Example in your previous Thread.
Yahoo Finance is not as easy site to parse,here how it look with tool mention over.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
#options.add_argument('--disable-gpu')
browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options)
#--| Parse or automation
url = 'https://finance.yahoo.com/quote/BARC.L/key-statistics?p=BARC.L'
browser.get(url)
soup = BeautifulSoup(browser.page_source, 'lxml')
accept = browser.find_elements_by_xpath('//*[@id="consent-page"]/div/div/div/div[3]/div/form/button[1]')
accept[0].click()
time.sleep(2)
main_vaule = browser.find_elements_by_xpath('//*[@id="quote-header-info"]/div[3]/div/div/span[1]')
print(main_vaule[0].text)

Output:
138.54

Comment out --headless and the browser will not load.
If new to this should load browser,then can see stuff happens like push button..ect.

davidm · Mar-05-2020, 01:52 AM

Thanks Snippsat
Super helpfull
Did succeed to install requests_html using

pip3 install requests_html

from the command prompt

davidm · (This post was last modified: Mar-06-2020, 12:02 AM by davidm.)

Hi Snippsat, tried running your code and get this error
[inline]
soup = BeautifulSoup(browser.page_source, 'lxml')
File "C:\Users\david\PycharmProjects\yahoo-finance\venv\lib\site-packages\bs4\__init__.py", line 225, in __init__
raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
[/inline]
Tried installing lxml package, but get error:
ERROR: b"'xslt-config' is not recognized as an internal or external command,\r\noperable program or batch file.\r\n"
Any ideas?
Thanks

***snippsat*** · Mar-06-2020, 03:23 PM

(Mar-06-2020, 12:01 AM)davidm Wrote: Tried installing lxml package, but get error:

Use gohlke lxml if pip install lxml fail.
Example:

pip install lxml‑4.5.0‑cp38‑cp38‑win_amd64.whl

Can also use an other parser eg the build in would be:

soup = BeautifulSoup(browser.page_source, 'html.parser')

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Requests_HTML not getting all data on Amazon	aaander	1	1,315	Nov-19-2022, 02:09 AM Last Post: aaander

error installing requests_html

User Panel Messages

Announcements