Posts: 7
Threads: 3
Joined: Dec 2018
Im trying to scrape a torrentz url, I get the Page loading html instead of search result html, tried to put sleep time didnt worked out.
anyone knows how to do
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import requests
browser = webdriver.Firefox()
browser.get("https://torrentz.eu/")
time.sleep(10)
#class selenium.webdriver.support.expected_conditions.title_contains(Torrent)
search = browser.find_element_by_id('thesearchbox')
search.send_keys('xxxxx')
search.send_keys(Keys.RETURN) # hit return after you enter search text
time.sleep(10)
tempurl = browser.current_url
print(tempurl)
tempcont = requests.get(tempurl, timeout=10)
soup = BeautifulSoup(tempcont.content, "html.parser")
print(soup.prettify())
Posts: 101
Threads: 7
Joined: Aug 2017
Make sure your ISP doesn't block torrent sites.
You can wait till the element is visible.
try waits
search = WebDriverWait(browser,5).until(lambda x: x.find_element_by_id('thesearchbox')
# where WebDriverWait(DRIVER,TIMEOUT_SECONDS)
browser.find_element_by_id('thesearchbutton').click() It keeps handling NoSuchElementException error for the specified amount of seconds.
Posts: 7
Threads: 3
Joined: Dec 2018
Jan-08-2019, 03:10 AM
(This post was last modified: Jan-08-2019, 03:10 AM by oneclick.)
I can able to get search result, and see the result page, code execute perfectly till print(tempurl) next two line does not give me a parse html code of Search result instead I get html code of Loading page
you can try this code for yourself
is their any way around
(Jan-07-2019, 07:34 AM)hbknjr Wrote: Make sure your ISP doesn't block torrent sites.
You can wait till the element is visible.
try waits
search = WebDriverWait(browser,5).until(lambda x: x.find_element_by_id('thesearchbox')
# where WebDriverWait(DRIVER,TIMEOUT_SECONDS)
browser.find_element_by_id('thesearchbutton').click() It keeps handling NoSuchElementException error for the specified amount of seconds.
Posts: 5,043
Threads: 385
Joined: Sep 2016
torrentz.eu is no longer active
https://tribune.com.pk/story/1156409/tor...one-knows/
Quote:Although the home page of Torrentz.eu is still active, it has completely disabled its search functionality and has removed all torrent links. It is still not clear why the website has been shut down.
Posts: 7
Threads: 3
Joined: Dec 2018
(Jan-08-2019, 04:15 AM)metulburr Wrote: torrentz.eu is no longer active https://tribune.com.pk/story/1156409/tor...one-knows/ Quote:Although the home page of Torrentz.eu is still active, it has completely disabled its search functionality and has removed all torrent links. It is still not clear why the website has been shut down. i have tried with torrentz2.eu same problem
Posts: 5,043
Threads: 385
Joined: Sep 2016
your code does work with that site for me. it printed out the html. However you dont need requests or beautifulsoup, if you use selenium. Selenium can make requests and parse html. You can also do it in the background so it doesnt bring up a browser. You should also use wait instead of time sleep. It will be faster.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
browser.get("https://torrentz2.eu/")
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, 'search')))
#class selenium.webdriver.support.expected_conditions.title_contains(Torrent)
search = browser.find_element_by_id('thesearchbox')
search.send_keys('xxxxx')
search.send_keys(Keys.RETURN) # hit return after you enter search text
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'results')))
Posts: 7
Threads: 3
Joined: Dec 2018
(Jan-11-2019, 03:39 PM)metulburr Wrote: your code does work with that site for me. it printed out the html. However you dont need requests or beautifulsoup, if you use selenium. Selenium can make requests and parse html. You can also do it in the background so it doesnt bring up a browser. You should also use wait instead of time sleep. It will be faster.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
browser.get("https://torrentz2.eu/")
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, 'search')))
#class selenium.webdriver.support.expected_conditions.title_contains(Torrent)
search = browser.find_element_by_id('thesearchbox')
search.send_keys('xxxxx')
search.send_keys(Keys.RETURN) # hit return after you enter search text
WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'results')))
Thank you
by the time read your post i found another way around
code = browser.page_source this code have helped me read the html content
Thanks for this generous help
Posts: 1
Threads: 0
Joined: Oct 2020
(Jan-07-2019, 03:08 AM)oneclick Wrote: Im trying to scrape a torrentz url, I get the Page loading html instead of search result html, tried to put sleep time didnt worked out.
anyone knows how to do
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import requests
browser = webdriver.Firefox()
browser.get("https://torrentz.eu/")
time.sleep(10)
#class selenium.webdriver.support.expected_conditions.title_contains(Torrent)
search = browser.find_element_by_id('thesearchbox')
search.send_keys('xxxxx')
search.send_keys(Keys.RETURN) # hit return after you enter search text
time.sleep(10)
tempurl = browser.current_url
print(tempurl)
tempcont = requests.get(tempurl, timeout=10)
soup = BeautifulSoup(tempcont.content, "html.parser")
print(soup.prettify())
Amazing one sir I have used it before but seems like this website is not working any more could you please create for this one? torrentzeu.org
Thank You.
|