urlparse to urllib.parse - the script stopped working

apollo · Oct-24-2017, 08:23 PM

dear community

The following code runned - like a charme - all is nice. Very well. in python version 2.xy

import urllib
import urlparse
import re

url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></
a><br/><small>(.*?)</small>', html):
    alk = urlparse.urljoin(url, lk)

    data = { 'url':alk, 'name':name, 'cname':capname }

    phtml = urllib.urlopen(alk).read()
    memail = re.search('<a href="mailto:(.*?)">', phtml)
    if memail:
        data['email'] = memail.group(1)

    print data

i got back the following

    
    IndentationError: Missing parentheses in call to 'print'
>>> 
>>> import urllib
>>> import urllib.parse
>>> import re
>>> 
>>> url = "http://search.cpan.org/author/?W"
>>> html = urllib.urlopen(url).read()
Traceback (innermost last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlopen'
>>> for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></
  File "<stdin>", line 1
    for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></
                                                                         ^
SyntaxError: EOL while scanning string literal
>>> a><br/><small>(.*?)</small>', html):
  File "<stdin>", line 1
    a><br/><small>(.*?)</small>', html):
      ^
SyntaxError: invalid syntax
>>>     alk = urlparse.urljoin(url, lk)
  File "<stdin>", line 1
    alk = urlparse.urljoin(url, lk)
    ^
IndentationError: unexpected indent
>>> 
>>>     data = { 'url':alk, 'name':name, 'cname':capname }
  File "<stdin>", line 1
    data = { 'url':alk, 'name':name, 'cname':capname }
    ^
IndentationError: unexpected indent
>>> 
>>>     phtml = urllib.urlopen(alk).read()
  File "<stdin>", line 1
    phtml = urllib.urlopen(alk).read()
    ^
IndentationError: unexpected indent
>>>     memail = re.search('<a href="mailto:(.*?)">', phtml)
  File "<stdin>", line 1
    memail = re.search('<a href="mailto:(.*?)">', phtml)
    ^
IndentationError: unexpected indent
>>>     if memail:
  File "<stdin>", line 1
    if memail:
    ^
IndentationError: unexpected indent
>>>         data['email'] = memail.group(1)
  File "<stdin>", line 1
    data['email'] = memail.group(1)
    ^
IndentationError: unexpected indent
>>> 
>>>     print data
  File "<stdin>", line 1
    print data
    ^
IndentationError: Missing parentheses in call to 'print'
>>>

okay - first of all i have to install the urllib.parse module
but i guess that there are some other errors waiting at the fence ...

wavic · Oct-24-2017, 08:54 PM

In Python 3 the print is not a statement but a function so in line 18 you have to close data in parenthesis: print(data)

hbknjr · Oct-25-2017, 04:15 AM

>>> import urllib
>>> import urllib.parse
>>> import re
>>> 
>>> url = "http://search.cpan.org/author/?W"
>>> html = urllib.urlopen(url).read()
Traceback (innermost last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlopen'

In Python 3 urlopen is in urllib.request module.(on line 6)

html = urllib.request.urlopen(url).read()

apollo · Oct-25-2017, 05:59 AM

hello dear both,

many thanks - i got the following results....

   >>> import urllib
     ^
SyntaxError: invalid syntax
>>> >>> import urllib.parse
>>> >>> import re
>>> >>> 
>>> >>> url = "http://search.cpan.org/author/?W"
>>> >>> html = urllib.urlopen(url).read()
Traceback (innermost last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlopen'
>>> Traceback (innermost last):
  File "<stdin>", line 1
    Traceback (innermost last):
                            ^
SyntaxError: invalid syntax
>>>   File "<stdin>", line 1, in <module>
  File "<stdin>", line 1
    File "<stdin>", line 1, in <module>
    ^
IndentationError: unexpected indent
>>> AttributeError: 'module' object has no attribute 'urlopen'
  File "<stdin>", line 1
    AttributeError: 'module' object has no attribute 'urlopen'
                  ^
SyntaxError: invalid syntax
>>> 
>>> >>> import urllib
  File "<stdin>", line 1
    >>> import urllib
     ^
SyntaxError: invalid syntax
>>> >>> import urllib.parse
>>> >>> import re
>>> >>> 
>>> >>> url = "http://search.cpan.org/author/?W"
>>> >>> html = urllib.urlopen(url).read()
Traceback (innermost last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlopen'
>>> Traceback (innermost last):
  File "<stdin>", line 1
    Traceback (innermost last):
                            ^
SyntaxError: invalid syntax
>>>   File "<stdin>", line 1, in <module>
  File "<stdin>", line 1
    File "<stdin>", line 1, in <module>
    ^
IndentationError: unexpected indent
>>> AttributeError: 'module' object has no attribute 'urlopen'

hbknjr · Oct-25-2017, 07:38 AM

Read my previous comment. you're using urllib.urlopen() but in python 3 its urllib.request.urlopen.

So correct code would look like :

>>> import urllib.request
>>> url = "http://search.cpan.org/author/?W"
>>> html = urllib.request.urlopen(url).read()

Secondly, do not copy paste the whole code in interpreter at once you'll lose indentation and get errors. Copy one line at a time or run it through a .py file.

apollo · (This post was last modified: Oct-26-2017, 06:58 AM by apollo.)

hello dear all,

many thanks for the hints - very supportive. with the above mentioned example i want to dive into real world topics of programming.

1. parsing
2. storing (in a database)

with the following fix of the threadstart posting i had luck in the Python-2xy environment:

Note: since i have on my linux box installed Python 3.4xy i needed a quick test on a 2xy testbed: I found one here: https://www.tutorialspoint.com/execute_p...online.php

import urllib
import urlparse
import re
 
url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
    alk = urlparse.urljoin(url, lk)
 
    data = { 'url':alk, 'name':name, 'cname':capname }
 
    phtml = urllib.urlopen(alk).read()
    memail = re.search('<a href="mailto:(.*?)">', phtml)
    if memail:data['email'] = memail.group(1)
 
    print data

the result looks like the following:

{'url': 'http://search.cpan.org/~wizeazz/', 'cname': 'WIZEAZZ', 'name': 'P. Verbaarschott', 'email': 'razor_mail%40yahoo.com'}
{'url': 'http://search.cpan.org/~wjblack/', 'cname': 'WJBLACK', 'name': 'William J. Black', 'email': 'bj%40wjblack.com'}

and like i said above - the results i want to store in a db - using peewee the db-abstraction model..

btw: this is another question (that has nothing to do with the parsing of retrived tata
- i need to do this at the weekend - guess that i should do this with the folling approach...

from peewee import *
import json

db = MySQLDatabase('mydb', user='john',passwd='mypass')

class User(Model):
    name = TextField()
    name2 = TextField()
    email_address = TextField()
    url = TextField()

    class Meta:
        database = db # this model uses the mydb database

User.create_table() #ensure table is created

data = json.load() #your json data file here

for entry in data: #assuming your data is an array of JSON objects
    user = User.create(name=entry["name"], name2=entry["name2"],
        email_address=entry["email-adress"], url=entry["url"])
    user.save()

again - Many thanks for your continued help

greetings apollo ;)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	pyperclip stopped working	OAP	8	1,219	Apr-13-2025, 12:54 PM Last Post: OAP
	Python-Kasa stopped working	liderbug	1	592	Jan-27-2025, 02:47 PM Last Post: liderbug
	urllib can't find "parse"	rjdegraff42	6	6,844	Jul-24-2023, 05:28 PM Last Post: deanhystad
	Help with urllib.request	Brian177	2	3,866	Apr-21-2021, 01:58 PM Last Post: Brian177
	Spyder stopped working in Windows 10	gammaray	3	4,448	Apr-19-2021, 05:33 PM Last Post: jefsummers
	urllib.request	ericmt123	2	3,211	Dec-21-2020, 06:53 PM Last Post: Larz60+
	urllib is not a package traceback	cc26	3	8,097	Aug-28-2020, 09:34 AM Last Post: snippsat
	for loop script over telnet in Python 3.5 is not working	abhijithd123	1	3,646	May-10-2020, 03:22 AM Last Post: bowlofred
	countdown script not working..plz help what is mistake	randyjack	1	2,822	Oct-28-2019, 06:57 AM Last Post: perfringo
	picamera not working on premade script	georgeaura	1	3,239	Jul-24-2019, 10:11 AM Last Post: gontajones

urlparse to urllib.parse - the script stopped working

User Panel Messages

Announcements