Python Forum
whats the error in this code in python 3.0 as this is running 2.0 pyhon
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
whats the error in this code in python 3.0 as this is running 2.0 pyhon
#1
my question here

import urllib
import re
urls=[]
i=0
regex='<title>(.+?)</title>'
pattern=re.compile(regex)
while i<len(urls):
   htmlfile=urllib.urlopen(urls[i])
   a=htmlfile.read()
   titles=re.findall(pattern,a)
   print titles
   i=i+1
Reply
#2
Hi ekansh,

I cannot see exactly what you are asking.

If you are trying to run this script under Python 3.0 then it will fail on the print statement as 3.0 requires bracket () around the element that you are asking it to print. e.g. print(titles).

But this is a wild guess as I am unsure of the exact question that you would like answered.

Kindly let us know what you would like us to look at.

Good Luck,

Bass

"The good thing about standards is that you have so many to choose from" Andy S. Tanenbaum
Reply
#3
Also note that the urllib prackage changed between python 2.x and 3.x. I think you would need to use urllib.requests.urlopen().
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#4
2to3 come with Python.
C:\python36\Tools\scripts
λ 2to3 -w url_con.py
After and also pep-8 fix.
import urllib.request, urllib.parse, urllib.error
import re

urls = []
i = 0
regex = '<title>(.+?)</title>'
pattern = re.compile(regex)
while i < len(urls):
   htmlfile = urllib.request.urlopen(urls[i])
   a = htmlfile.read()
   titles = re.findall(pattern, a)
   print(titles)
   i = i + 1
So to bad stuff regex with html Hand
Funny answer.

Better,take a look at Web-Scraping-part-1.
from bs4 import BeautifulSoup
import requests

urls = ['https://www.python.org/',
        'https://python-forum.io/',
        'http://cnn.com/']
for url in urls:
   url_get = requests.get(url)
   soup = BeautifulSoup(url_get.content, 'lxml')
   print(soup.select('head > title')[0].text)
Output:
Welcome to Python.org Python Forum CNN - Breaking News, U.S., World, Weather, Entertainment & Video News
Reply
#5
(Jul-17-2017, 07:41 PM)snippsat Wrote: 2to3 come with Python.
C:\python36\Tools\scripts
λ 2to3 -w url_con.py
After and also pep-8 fix.
import urllib.request, urllib.parse, urllib.error
import re

urls = []
i = 0
regex = '<title>(.+?)</title>'
pattern = re.compile(regex)
while i < len(urls):
   htmlfile = urllib.request.urlopen(urls[i])
   a = htmlfile.read()
   titles = re.findall(pattern, a)
   print(titles)
   i = i + 1
So to bad stuff regex with html Hand
Funny answer.

Better,take a look at Web-Scraping-part-1.
from bs4 import BeautifulSoup
import requests

urls = ['https://www.python.org/',
        'https://python-forum.io/',
        'http://cnn.com/']
for url in urls:
   url_get = requests.get(url)
   soup = BeautifulSoup(url_get.content, 'lxml')
   print(soup.select('head > title')[0].text)
Output:
Welcome to Python.org Python Forum CNN - Breaking News, U.S., World, Weather, Entertainment & Video News
Reply
#6
What was the point of simply quoting @snippsat's entire post?
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#7
(Jul-17-2017, 05:35 PM)ekansh Wrote: my question here

import urllib
import re
urls=[]
i=0
regex='<title>(.+?)</title>'
pattern=re.compile(regex)
while i<len(urls):
   htmlfile=urllib.urlopen(urls[i])
   a=htmlfile.read()
   titles=re.findall(pattern,a)
   print titles
   i=i+1

It's a syntax error. If you run it, python will tell you what the problem is.
Reply
#8
I think that you should try selenium. It's better.

from selenium import webdriver
browser = webdriver.Chrome()
browser.get("https://www.python.org/")
nav = browser.find_element_by_id("mainnav")
print(nav.text)
Check these examples:https://likegeeks.com/python-web-scraping/
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Error after a few minutes of running julio2000 10 5,295 Feb-05-2020, 11:05 PM
Last Post: julio2000
  error when running headless selenium julio2000 2 4,508 Feb-01-2020, 12:41 PM
Last Post: julio2000
  What the difference between python2 and python3 when they running my code. lpangfeic 1 1,834 Nov-19-2019, 04:44 PM
Last Post: Larz60+
  Running flask run produces error. Charles1 1 4,028 Oct-04-2019, 10:38 PM
Last Post: snippsat
  Error when running .py file but not when entering exact same code in CLI Broadsworde 3 2,708 Dec-11-2018, 09:06 AM
Last Post: Broadsworde

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020