Python Forum
I want to scrap a tor site.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
I want to scrap a tor site.
#1
Hi I been working no this for over a month. I can get some data from this site with requests. I can't get any type of a reply with
BeautifulSoup. here is my code with requests:

import socket
import socks
import urllib2
import requests
from bs4 import BeautifulSoup


ipcheck_url = 'http://checkip.amazonaws.com/'



# Actual IP.
print(urllib2.urlopen(ipcheck_url).read())

# Tor IP.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9150)
socket.socket = socks.socksocket
print(urllib2.urlopen(ipcheck_url).read())
url = 'http://darkzzx4avcsuofgfez5zq75cqc4mprjvfqywo45dfcaxrwqg6qrlfid.onion/'
r = requests.get(url)
print(r.url)
print r.encoding
print r.raw
print r.status_code
print r.headers
print r.url
Here is my code with soup:

import socket
import socks
import urllib2
import requests
from bs4 import BeautifulSoup


ipcheck_url = 'http://checkip.amazonaws.com/'



# Actual IP.
print(urllib2.urlopen(ipcheck_url).read())

# Tor IP.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', 9150)
socket.socket = socks.socksocket
print(urllib2.urlopen(ipcheck_url).read())

soup = BeautifulSoup(requests.get("http://darkzzx4avcsuofgfez5zq75cqc4mprjvfqywo45dfcaxrwqg6qrlfid.onion/").text, 'lxml')
print soup
Now am I doing something dumb or for some dumb thing soup does not work on onion sites.
I hope some one can help with this. I have a lot of work to do on the dark web.
Thank you
renny
Reply


Messages In This Thread
I want to scrap a tor site. - by Blue Dog - Sep-15-2021, 06:18 PM
RE: I want to scrap a tor site. - by Larz60+ - Sep-15-2021, 08:55 PM
RE: I want to scrap a tor site. - by Blue Dog - Sep-15-2021, 09:54 PM
RE: I want to scrap a tor site. - by Blue Dog - Sep-22-2021, 02:53 PM
RE: I want to scrap a tor site. - by Blue Dog - Sep-26-2021, 08:23 PM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020