Python Forum

Full Version: I want to scrap a tor site.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi I been working no this for over a month. I can get some data from this site with requests. I can't get any type of a reply with
BeautifulSoup. here is my code with requests:

import socket
import socks
import urllib2
import requests
from bs4 import BeautifulSoup


ipcheck_url = 'http://checkip.amazonaws.com/'



# Actual IP.
print(urllib2.urlopen(ipcheck_url).read())

# Tor IP.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9150)
socket.socket = socks.socksocket
print(urllib2.urlopen(ipcheck_url).read())
url = 'http://darkzzx4avcsuofgfez5zq75cqc4mprjvfqywo45dfcaxrwqg6qrlfid.onion/'
r = requests.get(url)
print(r.url)
print r.encoding
print r.raw
print r.status_code
print r.headers
print r.url
Here is my code with soup:

import socket
import socks
import urllib2
import requests
from bs4 import BeautifulSoup


ipcheck_url = 'http://checkip.amazonaws.com/'



# Actual IP.
print(urllib2.urlopen(ipcheck_url).read())

# Tor IP.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9150)
socket.socket = socks.socksocket
print(urllib2.urlopen(ipcheck_url).read())

soup = BeautifulSoup(requests.get("http://darkzzx4avcsuofgfez5zq75cqc4mprjvfqywo45dfcaxrwqg6qrlfid.onion/").text, 'lxml')
print soup
Now am I doing something dumb or for some dumb thing soup does not work on onion sites.
I hope some one can help with this. I have a lot of work to do on the dark web.
Thank you
renny
I'd break up the statement (line 20)
then check requests.status_code to see if it is 200 (success).
The URL may not be correct.
I did have a status check in the code
I did break the line up. try it both way.
I been working on this for a month, their not much that I have not tryed Wall
Thanks for all the help, I got it working great. I am in scraper haven.
Let me ask you this:
does the dark web site have a ip address or just the name of the site>
I am trying to convert the name like 2635/6.onion to some type ip address like this 122.34.127.90.
So far I have not been able to do it.
Thank you
Renny Wall