Python Forum
urlparse to urllib.parse - the script stopped working
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
urlparse to urllib.parse - the script stopped working
#1
dear community


The following code runned - like a charme - all is nice. Very well. in python version 2.xy


import urllib
import urlparse
import re

url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></
a><br/><small>(.*?)</small>', html):
    alk = urlparse.urljoin(url, lk)

    data = { 'url':alk, 'name':name, 'cname':capname }

    phtml = urllib.urlopen(alk).read()
    memail = re.search('<a href="mailto:(.*?)">', phtml)
    if memail:
        data['email'] = memail.group(1)

    print data
i got back the following


    
    IndentationError: Missing parentheses in call to 'print'
>>> 
>>> import urllib
>>> import urllib.parse
>>> import re
>>> 
>>> url = "http://search.cpan.org/author/?W"
>>> html = urllib.urlopen(url).read()
Traceback (innermost last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlopen'
>>> for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></
  File "<stdin>", line 1
    for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></
                                                                         ^
SyntaxError: EOL while scanning string literal
>>> a><br/><small>(.*?)</small>', html):
  File "<stdin>", line 1
    a><br/><small>(.*?)</small>', html):
      ^
SyntaxError: invalid syntax
>>>     alk = urlparse.urljoin(url, lk)
  File "<stdin>", line 1
    alk = urlparse.urljoin(url, lk)
    ^
IndentationError: unexpected indent
>>> 
>>>     data = { 'url':alk, 'name':name, 'cname':capname }
  File "<stdin>", line 1
    data = { 'url':alk, 'name':name, 'cname':capname }
    ^
IndentationError: unexpected indent
>>> 
>>>     phtml = urllib.urlopen(alk).read()
  File "<stdin>", line 1
    phtml = urllib.urlopen(alk).read()
    ^
IndentationError: unexpected indent
>>>     memail = re.search('<a href="mailto:(.*?)">', phtml)
  File "<stdin>", line 1
    memail = re.search('<a href="mailto:(.*?)">', phtml)
    ^
IndentationError: unexpected indent
>>>     if memail:
  File "<stdin>", line 1
    if memail:
    ^
IndentationError: unexpected indent
>>>         data['email'] = memail.group(1)
  File "<stdin>", line 1
    data['email'] = memail.group(1)
    ^
IndentationError: unexpected indent
>>> 
>>>     print data
  File "<stdin>", line 1
    print data
    ^
IndentationError: Missing parentheses in call to 'print'
>>> 
okay - first of all i have to install the urllib.parse module
but i guess that there are some other errors waiting at the fence ...
Reply
#2
In Python 3 the print is not a statement but a function so in line 18 you have to close data in parenthesis: print(data)
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#3
>>> import urllib
>>> import urllib.parse
>>> import re
>>> 
>>> url = "http://search.cpan.org/author/?W"
>>> html = urllib.urlopen(url).read()
Traceback (innermost last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlopen'
In Python 3 urlopen is in urllib.request module.(on line 6)

html = urllib.request.urlopen(url).read()
Reply
#4
hello dear both,

many thanks - i got the following results....

   >>> import urllib
     ^
SyntaxError: invalid syntax
>>> >>> import urllib.parse
>>> >>> import re
>>> >>> 
>>> >>> url = "http://search.cpan.org/author/?W"
>>> >>> html = urllib.urlopen(url).read()
Traceback (innermost last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlopen'
>>> Traceback (innermost last):
  File "<stdin>", line 1
    Traceback (innermost last):
                            ^
SyntaxError: invalid syntax
>>>   File "<stdin>", line 1, in <module>
  File "<stdin>", line 1
    File "<stdin>", line 1, in <module>
    ^
IndentationError: unexpected indent
>>> AttributeError: 'module' object has no attribute 'urlopen'
  File "<stdin>", line 1
    AttributeError: 'module' object has no attribute 'urlopen'
                  ^
SyntaxError: invalid syntax
>>> 
>>> >>> import urllib
  File "<stdin>", line 1
    >>> import urllib
     ^
SyntaxError: invalid syntax
>>> >>> import urllib.parse
>>> >>> import re
>>> >>> 
>>> >>> url = "http://search.cpan.org/author/?W"
>>> >>> html = urllib.urlopen(url).read()
Traceback (innermost last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlopen'
>>> Traceback (innermost last):
  File "<stdin>", line 1
    Traceback (innermost last):
                            ^
SyntaxError: invalid syntax
>>>   File "<stdin>", line 1, in <module>
  File "<stdin>", line 1
    File "<stdin>", line 1, in <module>
    ^
IndentationError: unexpected indent
>>> AttributeError: 'module' object has no attribute 'urlopen'
Reply
#5
Read my previous comment. you're using urllib.urlopen() but in python 3 its urllib.request.urlopen.

So correct code would look like :
>>> import urllib.request
>>> url = "http://search.cpan.org/author/?W"
>>> html = urllib.request.urlopen(url).read()
Secondly, do not copy paste the whole code in interpreter at once you'll lose indentation and get errors. Copy one line at a time or run it through a .py file.
Reply
#6
hello dear all,

many thanks for the hints - very supportive. with the above mentioned example i want to dive into real world topics of programming.


1. parsing
2. storing (in a database)

with the following fix of the threadstart posting i had luck in the Python-2xy environment:

Note: since i have on my linux box installed Python 3.4xy i needed a quick test on a 2xy testbed: I found one here: https://www.tutorialspoint.com/execute_p...online.php

import urllib
import urlparse
import re
 
url = "http://search.cpan.org/author/?W"
html = urllib.urlopen(url).read()
for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
    alk = urlparse.urljoin(url, lk)
 
    data = { 'url':alk, 'name':name, 'cname':capname }
 
    phtml = urllib.urlopen(alk).read()
    memail = re.search('<a href="mailto:(.*?)">', phtml)
    if memail:data['email'] = memail.group(1)
 
    print data
the result looks like the following:

{'url': 'http://search.cpan.org/~wizeazz/', 'cname': 'WIZEAZZ', 'name': 'P. Verbaarschott', 'email': 'razor_mail%40yahoo.com'}
{'url': 'http://search.cpan.org/~wjblack/', 'cname': 'WJBLACK', 'name': 'William J. Black', 'email': 'bj%40wjblack.com'}
and like i said above - the results i want to store in a db - using peewee the db-abstraction model..

btw: this is another question (that has nothing to do with the parsing of retrived tata
- i need to do this at the weekend - guess that i should do this with the folling approach...


from peewee import *
import json

db = MySQLDatabase('mydb', user='john',passwd='mypass')

class User(Model):
    name = TextField()
    name2 = TextField()
    email_address = TextField()
    url = TextField()

    class Meta:
        database = db # this model uses the mydb database

User.create_table() #ensure table is created

data = json.load() #your json data file here

for entry in data: #assuming your data is an array of JSON objects
    user = User.create(name=entry["name"], name2=entry["name2"],
        email_address=entry["email-adress"], url=entry["url"])
    user.save()
again - Many thanks for your continued help

greetings apollo ;)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  urllib can't find "parse" rjdegraff42 6 1,969 Jul-24-2023, 05:28 PM
Last Post: deanhystad
  Help with urllib.request Brian177 2 2,839 Apr-21-2021, 01:58 PM
Last Post: Brian177
  Spyder stopped working in Windows 10 gammaray 3 3,030 Apr-19-2021, 05:33 PM
Last Post: jefsummers
  urllib.request ericmt123 2 2,389 Dec-21-2020, 06:53 PM
Last Post: Larz60+
  urllib is not a package traceback cc26 3 5,291 Aug-28-2020, 09:34 AM
Last Post: snippsat
  for loop script over telnet in Python 3.5 is not working abhijithd123 1 2,859 May-10-2020, 03:22 AM
Last Post: bowlofred
  countdown script not working..plz help what is mistake randyjack 1 2,079 Oct-28-2019, 06:57 AM
Last Post: perfringo
  picamera not working on premade script georgeaura 1 2,479 Jul-24-2019, 10:11 AM
Last Post: gontajones
  urllib request error 404 Coco 2 4,365 May-11-2019, 02:47 PM
Last Post: Larz60+
  python has stopped working sally 1 5,989 Nov-22-2018, 10:19 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020