Python Forum
I want to scrap a tor site.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
I want to scrap a tor site.
#1
Hi I been working no this for over a month. I can get some data from this site with requests. I can't get any type of a reply with
BeautifulSoup. here is my code with requests:

import socket
import socks
import urllib2
import requests
from bs4 import BeautifulSoup


ipcheck_url = 'http://checkip.amazonaws.com/'



# Actual IP.
print(urllib2.urlopen(ipcheck_url).read())

# Tor IP.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9150)
socket.socket = socks.socksocket
print(urllib2.urlopen(ipcheck_url).read())
url = 'http://darkzzx4avcsuofgfez5zq75cqc4mprjvfqywo45dfcaxrwqg6qrlfid.onion/'
r = requests.get(url)
print(r.url)
print r.encoding
print r.raw
print r.status_code
print r.headers
print r.url
Here is my code with soup:

import socket
import socks
import urllib2
import requests
from bs4 import BeautifulSoup


ipcheck_url = 'http://checkip.amazonaws.com/'



# Actual IP.
print(urllib2.urlopen(ipcheck_url).read())

# Tor IP.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9150)
socket.socket = socks.socksocket
print(urllib2.urlopen(ipcheck_url).read())

soup = BeautifulSoup(requests.get("http://darkzzx4avcsuofgfez5zq75cqc4mprjvfqywo45dfcaxrwqg6qrlfid.onion/").text, 'lxml')
print soup
Now am I doing something dumb or for some dumb thing soup does not work on onion sites.
I hope some one can help with this. I have a lot of work to do on the dark web.
Thank you
renny
Reply
#2
I'd break up the statement (line 20)
then check requests.status_code to see if it is 200 (success).
The URL may not be correct.
Reply
#3
I did have a status check in the code
I did break the line up. try it both way.
I been working on this for a month, their not much that I have not tryed Wall
Reply
#4
Thanks for all the help, I got it working great. I am in scraper haven.
Reply
#5
Let me ask you this:
does the dark web site have a ip address or just the name of the site>
I am trying to convert the name like 2635/6.onion to some type ip address like this 122.34.127.90.
So far I have not been able to do it.
Thank you
Renny Wall
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020