Python Forum
I want to scrap a tor site. - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Networking (https://python-forum.io/forum-12.html)
+--- Thread: I want to scrap a tor site. (/thread-34926.html)



I want to scrap a tor site. - Blue Dog - Sep-15-2021

Hi I been working no this for over a month. I can get some data from this site with requests. I can't get any type of a reply with
BeautifulSoup. here is my code with requests:

import socket
import socks
import urllib2
import requests
from bs4 import BeautifulSoup


ipcheck_url = 'http://checkip.amazonaws.com/'



# Actual IP.
print(urllib2.urlopen(ipcheck_url).read())

# Tor IP.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9150)
socket.socket = socks.socksocket
print(urllib2.urlopen(ipcheck_url).read())
url = 'http://darkzzx4avcsuofgfez5zq75cqc4mprjvfqywo45dfcaxrwqg6qrlfid.onion/'
r = requests.get(url)
print(r.url)
print r.encoding
print r.raw
print r.status_code
print r.headers
print r.url
Here is my code with soup:

import socket
import socks
import urllib2
import requests
from bs4 import BeautifulSoup


ipcheck_url = 'http://checkip.amazonaws.com/'



# Actual IP.
print(urllib2.urlopen(ipcheck_url).read())

# Tor IP.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9150)
socket.socket = socks.socksocket
print(urllib2.urlopen(ipcheck_url).read())

soup = BeautifulSoup(requests.get("http://darkzzx4avcsuofgfez5zq75cqc4mprjvfqywo45dfcaxrwqg6qrlfid.onion/").text, 'lxml')
print soup
Now am I doing something dumb or for some dumb thing soup does not work on onion sites.
I hope some one can help with this. I have a lot of work to do on the dark web.
Thank you
renny


RE: I want to scrap a tor site. - Larz60+ - Sep-15-2021

I'd break up the statement (line 20)
then check requests.status_code to see if it is 200 (success).
The URL may not be correct.


RE: I want to scrap a tor site. - Blue Dog - Sep-15-2021

I did have a status check in the code
I did break the line up. try it both way.
I been working on this for a month, their not much that I have not tryed Wall


RE: I want to scrap a tor site. - Blue Dog - Sep-22-2021

Thanks for all the help, I got it working great. I am in scraper haven.


RE: I want to scrap a tor site. - Blue Dog - Sep-26-2021

Let me ask you this:
does the dark web site have a ip address or just the name of the site>
I am trying to convert the name like 2635/6.onion to some type ip address like this 122.34.127.90.
So far I have not been able to do it.
Thank you
Renny Wall