Selenium Parsing (unable to Parse page after loading)

oneclick · Jan-07-2019, 03:08 AM

Im trying to scrape a torrentz url, I get the Page loading html instead of search result html, tried to put sleep time didnt worked out.

anyone knows how to do

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import requests

browser = webdriver.Firefox()
browser.get("https://torrentz.eu/")
time.sleep(10)

#class selenium.webdriver.support.expected_conditions.title_contains(Torrent)

search = browser.find_element_by_id('thesearchbox')
search.send_keys('xxxxx')
search.send_keys(Keys.RETURN) # hit return after you enter search text
time.sleep(10)
    
tempurl = browser.current_url
print(tempurl)
tempcont = requests.get(tempurl, timeout=10)
soup = BeautifulSoup(tempcont.content, "html.parser")

print(soup.prettify())

hbknjr · Jan-07-2019, 07:34 AM

Make sure your ISP doesn't block torrent sites.

You can wait till the element is visible.
try waits

search  = WebDriverWait(browser,5).until(lambda x: x.find_element_by_id('thesearchbox')
# where WebDriverWait(DRIVER,TIMEOUT_SECONDS)
browser.find_element_by_id('thesearchbutton').click()

It keeps handling NoSuchElementException error for the specified amount of seconds.

oneclick · (This post was last modified: Jan-08-2019, 03:10 AM by oneclick.)

I can able to get search result, and see the result page, code execute perfectly till

print(tempurl)

next two line does not give me a parse html code of Search result instead I get html code of Loading page

you can try this code for yourself

is their any way around

(Jan-07-2019, 07:34 AM)hbknjr Wrote: Make sure your ISP doesn't block torrent sites.

You can wait till the element is visible.
try waits
search  = WebDriverWait(browser,5).until(lambda x: x.find_element_by_id('thesearchbox')
# where WebDriverWait(DRIVER,TIMEOUT_SECONDS)
browser.find_element_by_id('thesearchbutton').click()
It keeps handling NoSuchElementException error for the specified amount of seconds.

***metulburr*** · Jan-08-2019, 04:15 AM

torrentz.eu is no longer active

https://tribune.com.pk/story/1156409/tor...one-knows/

Quote:Although the home page of Torrentz.eu is still active, it has completely disabled its search functionality and has removed all torrent links. It is still not clear why the website has been shut down.

oneclick · Jan-11-2019, 04:53 AM

(Jan-08-2019, 04:15 AM)metulburr Wrote: torrentz.eu is no longer active https://tribune.com.pk/story/1156409/tor...one-knows/
Quote:Although the home page of Torrentz.eu is still active, it has completely disabled its search functionality and has removed all torrent links. It is still not clear why the website has been shut down.

i have tried with torrentz2.eu same problem

***metulburr*** · Jan-11-2019, 03:39 PM

your code does work with that site for me. it printed out the html. However you dont need requests or beautifulsoup, if you use selenium. Selenium can make requests and parse html. You can also do it in the background so it doesnt bring up a browser. You should also use wait instead of time sleep. It will be faster.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

 
browser = webdriver.Firefox()
browser.get("https://torrentz2.eu/")
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, 'search')))
 
#class selenium.webdriver.support.expected_conditions.title_contains(Torrent)
 
search = browser.find_element_by_id('thesearchbox')
search.send_keys('xxxxx')
search.send_keys(Keys.RETURN) # hit return after you enter search text
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'results')))

oneclick · Jan-13-2019, 03:10 AM

(Jan-11-2019, 03:39 PM)metulburr Wrote: your code does work with that site for me. it printed out the html. However you dont need requests or beautifulsoup, if you use selenium. Selenium can make requests and parse html. You can also do it in the background so it doesnt bring up a browser. You should also use wait instead of time sleep. It will be faster.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

 
browser = webdriver.Firefox()
browser.get("https://torrentz2.eu/")
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, 'search')))
 
#class selenium.webdriver.support.expected_conditions.title_contains(Torrent)
 
search = browser.find_element_by_id('thesearchbox')
search.send_keys('xxxxx')
search.send_keys(Keys.RETURN) # hit return after you enter search text
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'results')))
     

Thank you

by the time read your post i found another way around

code = browser.page_source

this code have helped me read the html content

Thanks for this generous help

tomalex · Oct-30-2020, 08:13 PM

(Jan-07-2019, 03:08 AM)oneclick Wrote: Im trying to scrape a torrentz url, I get the Page loading html instead of search result html, tried to put sleep time didnt worked out.

anyone knows how to do

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import requests

browser = webdriver.Firefox()
browser.get("https://torrentz.eu/")
time.sleep(10)

#class selenium.webdriver.support.expected_conditions.title_contains(Torrent)

search = browser.find_element_by_id('thesearchbox')
search.send_keys('xxxxx')
search.send_keys(Keys.RETURN) # hit return after you enter search text
time.sleep(10)
    
tempurl = browser.current_url
print(tempurl)
tempcont = requests.get(tempurl, timeout=10)
soup = BeautifulSoup(tempcont.content, "html.parser")

print(soup.prettify())

Amazing one sir I have used it before but seems like this website is not working any more could you please create for this one? torrentzeu.org

Thank You.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Unable to convert browser generated xml to parse in BeautifulSoup	Nik1811	0	1,311	Mar-22-2024, 01:37 PM Last Post: Nik1811
	Click on a button on web page using Selenium	Pavel_47	7	7,201	Jan-05-2023, 04:20 AM Last Post: ellapurnellrt
	Selenium/Helium loads up a blank web page	firaki12345	0	2,736	Mar-23-2021, 11:51 AM Last Post: firaki12345
	Parsing html page and working with checkbox (on a captcha)	straannick	17	16,023	Feb-04-2021, 02:54 PM Last Post: snippsat
	Saving html page and reloading into selenium while developing all xpaths	Larz60+	4	5,849	Feb-04-2021, 07:01 AM Last Post: jonathanwhite1
	Selenium Page Object Model with Python	Cryptus	5	5,903	Aug-19-2020, 06:30 AM Last Post: mlieqo
	Selenium on Angular page	Martinelli	3	7,670	Jul-28-2020, 12:40 PM Last Post: Martinelli
	Unable to click element in web page	Kalpana	0	2,358	Jun-25-2020, 05:20 AM Last Post: Kalpana
	use Xpath in Python :: libxml2 for a page-to-page skip-setting	apollo	2	4,656	Mar-19-2020, 06:13 PM Last Post: apollo
	Selenium get data from newly accessed page	hoff1022	2	3,683	Oct-09-2019, 06:52 PM Last Post: hoff1022

Selenium Parsing (unable to Parse page after loading)

User Panel Messages

Announcements