May-02-2020, 02:06 AM
I'm working locally in Visual Studio on a simple Python program that prompts the user for a URL, then parses all of the anchor <a> tags. I'm using https://www.google.com to test. I suspect that it may have something to do with my imports/modules and the version of Python I'm using, which is python3. Can someone help?
I can run the code successfully in external tools, but not locally. Here is the error I receive:
I can run the code successfully in external tools, but not locally. Here is the error I receive:
Error:Traceback (most recent call last):
File "/Users/MYCOMPUTERNAME/Desktop/code3/_ex-12.1.py", line 9, in <module>
from urllib.request import urlopen
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 88, in <module>
import http.client
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1342, in <module>
import ssl
File "/Users/MYCOMPUTERNAME/Desktop/code3/ssl.py", line 8, in <module>
HOST = socket.getaddrinfo(HOST, PORT)[0][4][0]
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 918, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
I have checked to see whether all of my modules were installed correctly with pip install {module}. The names and versions are below.Name: requests Version: 2.20.1 ... Name: urllib3 Version: 1.24.3 ... Name: bs4 Version: 0.0.1 ... Name: beautifulsoup4 Version: 4.9.0Here is my code.
from urllib.request import urlopen from bs4 import BeautifulSoup import ssl # Ignore SSL certificate errors ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE url = input('Enter - ') html = urlopen(url, context=ctx).read() soup = BeautifulSoup(html, "html.parser") # Retrieve all of the anchor tags tags = soup('a') for tag in tags: # Look at the parts of a tag print('TAG:', tag) print('URL:', tag.get('href', None)) print('Contents:', tag.contents[0]) print('Attrs:', tag.attrs)I'm using https://www.google.com as my input for testing pursposes. Here is what the actual output should be (as I have verified in external tools).
TAG: <a class="gb1" href="https://www.google.com/imghp?hl=en&tab=wi">Images</a> URL: https://www.google.com/imghp?hl=en&tab=wi Contents: Images Attrs: {'class': ['gb1'], 'href': 'https://www.google.com/imghp?hl=en&tab=wi'} TAG: <a class="gb1" href="https://maps.google.com/maps?hl=en&tab=wl">Maps</a> URL: https://maps.google.com/maps?hl=en&tab=wl Contents: Maps Attrs: {'class': ['gb1'], 'href': 'https://maps.google.com/maps?hl=en&tab=wl'} TAG: <a class="gb1" href="https://play.google.com/?hl=en&tab=w8">Play</a> URL: https://play.google.com/?hl=en&tab=w8 Contents: Play Attrs: {'class': ['gb1'], 'href': 'https://play.google.com/?hl=en&tab=w8'} TAG: <a class="gb1" href="https://www.youtube.com/?gl=US&tab=w1">YouTube</a> URL: https://www.youtube.com/?gl=US&tab=w1 Contents: YouTube Attrs: {'class': ['gb1'], 'href': 'https://www.youtube.com/?gl=US&tab=w1'} TAG: <a class="gb1" href="https://news.google.com/nwshp?hl=en&tab=wn">News</a> URL: https://news.google.com/nwshp?hl=en&tab=wn Contents: News Attrs: {'class': ['gb1'], 'href': 'https://news.google.com/nwshp?hl=en&tab=wn'} TAG: <a class="gb1" href="https://mail.google.com/mail/?tab=wm">Gmail</a> URL: https://mail.google.com/mail/?tab=wm Contents: Gmail Attrs: {'class': ['gb1'], 'href': 'https://mail.google.com/mail/?tab=wm'} TAG: <a class="gb1" href="https://drive.google.com/?tab=wo">Drive</a> URL: https://drive.google.com/?tab=wo Contents: Drive Attrs: {'class': ['gb1'], 'href': 'https://drive.google.com/?tab=wo'} TAG: <a class="gb1" href="https://www.google.com/intl/en/about/products?tab=wh" style="text-decoration:none"><u>More</u> ยป</a> URL: https://www.google.com/intl/en/about/products?tab=wh Contents: <u>More</u> Attrs: {'class': ['gb1'], 'style': 'text-decoration:none', 'href': 'https://www.google.com/intl/en/about/products?tab=wh'} TAG: <a class="gb4" href="http://www.google.com/history/optout?hl=en">Web History</a> URL: http://www.google.com/history/optout?hl=en Contents: Web History Attrs: {'href': 'http://www.google.com/history/optout?hl=en', 'class': ['gb4']} TAG: <a class="gb4" href="/preferences?hl=en">Settings</a> URL: /preferences?hl=en Contents: Settings Attrs: {'href': '/preferences?hl=en', 'class': ['gb4']} TAG: <a class="gb4" href="https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/" id="gb_70" target="_top">Sign in</a> URL: https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/ Contents: Sign in Attrs: {'target': '_top', 'id': 'gb_70', 'href': 'https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.com/', 'class': ['gb4']} TAG: <a href="/search?ie=UTF-8&q=popular+Google+Doodle+games&oi=ddle&ct=153498820&hl=en&sa=X&ved=0ahUKEwjaucTPgpTpAhVkc98KHX2IAwQQPQgD"><img alt="Stay and Play at Home with Popular Past Google Doodles: Garden Gnomes (2018)" border="0" height="220" id="hplogo" src="/logos/doodles/2020/stay-and-play-at-home-with-popular-past-google-doodles-garden-gnomes-2018-6753651837108770.2-law.gif" title="Stay and Play at Home with Popular Past Google Doodles: Garden Gnomes (2018)" width="550"><br/></img></a> URL: /search?ie=UTF-8&q=popular+Google+Doodle+games&oi=ddle&ct=153498820&hl=en&sa=X&ved=0ahUKEwjaucTPgpTpAhVkc98KHX2IAwQQPQgD Contents: <img alt="Stay and Play at Home with Popular Past Google Doodles: Garden Gnomes (2018)" border="0" height="220" id="hplogo" src="/logos/doodles/2020/stay-and-play-at-home-with-popular-past-google-doodles-garden-gnomes-2018-6753651837108770.2-law.gif" title="Stay and Play at Home with Popular Past Google Doodles: Garden Gnomes (2018)" width="550"><br/></img> Attrs: {'href': '/search?ie=UTF-8&q=popular+Google+Doodle+games&oi=ddle&ct=153498820&hl=en&sa=X&ved=0ahUKEwjaucTPgpTpAhVkc98KHX2IAwQQPQgD'} TAG: <a href="/advanced_search?hl=en&authuser=0">Advanced search</a> URL: /advanced_search?hl=en&authuser=0 Contents: Advanced search Attrs: {'href': '/advanced_search?hl=en&authuser=0'} TAG: <a class="NKcBbd" href="https://www.google.com/url?q=https://www.youtube.com/stayhome%3Futm_source%3Dgoogle%26utm_medium%3Dhppromo%26utm_campaign%3DHelpathomeYTUS&source=hpp&id=19017530&ct=3&usg=AFQjCNE2FZizHR5ncV3c9xnzo6f1UpGKmQ&sa=X&ved=0ahUKEwjaucTPgpTpAhVkc98KHX2IAwQQ8IcBCAU" rel="nofollow">Make the most of your time at home with tips for recipes, workouts, and more</a> URL: https://www.google.com/url?q=https://www.youtube.com/stayhome%3Futm_source%3Dgoogle%26utm_medium%3Dhppromo%26utm_campaign%3DHelpathomeYTUS&source=hpp&id=19017530&ct=3&usg=AFQjCNE2FZizHR5ncV3c9xnzo6f1UpGKmQ&sa=X&ved=0ahUKEwjaucTPgpTpAhVkc98KHX2IAwQQ8IcBCAU Contents: Make the most of your time at home with tips for recipes, workouts, and more Attrs: {'class': ['NKcBbd'], 'href': 'https://www.google.com/url?q=https://www.youtube.com/stayhome%3Futm_source%3Dgoogle%26utm_medium%3Dhppromo%26utm_campaign%3DHelpathomeYTUS&source=hpp&id=19017530&ct=3&usg=AFQjCNE2FZizHR5ncV3c9xnzo6f1UpGKmQ&sa=X&ved=0ahUKEwjaucTPgpTpAhVkc98KHX2IAwQQ8IcBCAU', 'rel': ['nofollow']} TAG: <a href="/intl/en/ads/">Advertising Programs</a> URL: /intl/en/ads/ Contents: Advertising Programs Attrs: {'href': '/intl/en/ads/'} TAG: <a href="/services/">Business Solutions</a> URL: /services/ Contents: Business Solutions Attrs: {'href': '/services/'} TAG: <a href="/intl/en/about.html">About Google</a> URL: /intl/en/about.html Contents: About Google Attrs: {'href': '/intl/en/about.html'} TAG: <a href="/intl/en/policies/privacy/">Privacy</a> URL: /intl/en/policies/privacy/ Contents: Privacy Attrs: {'href': '/intl/en/policies/privacy/'} TAG: <a href="/intl/en/policies/terms/">Terms</a> URL: /intl/en/policies/terms/ Contents: Terms Attrs: {'href': '/intl/en/policies/terms/'}