my question here
import urllib
import re
urls=[]
i=0
regex='<title>(.+?)</title>'
pattern=re.compile(regex)
while i<len(urls):
htmlfile=urllib.urlopen(urls[i])
a=htmlfile.read()
titles=re.findall(pattern,a)
print titles
i=i+1
Hi ekansh,
I cannot see exactly what you are asking.
If you are trying to run this script under Python 3.0 then it will fail on the print statement as 3.0 requires bracket () around the element that you are asking it to print. e.g. print(titles).
But this is a wild guess as I am unsure of the exact question that you would like answered.
Kindly let us know what you would like us to look at.
Good Luck,
Bass
Also note that the urllib prackage changed between python 2.x and 3.x. I think you would need to use urllib.requests.urlopen().
2to3 come with Python.
C:\python36\Tools\scripts
λ 2to3 -w url_con.py
After and also
pep-8 fix.
import urllib.request, urllib.parse, urllib.error
import re
urls = []
i = 0
regex = '<title>(.+?)</title>'
pattern = re.compile(regex)
while i < len(urls):
htmlfile = urllib.request.urlopen(urls[i])
a = htmlfile.read()
titles = re.findall(pattern, a)
print(titles)
i = i + 1
So to bad stuff regex with html
Funny
answer.
Better,take a look at
Web-Scraping-part-1.
from bs4 import BeautifulSoup
import requests
urls = ['https://www.python.org/',
'https://python-forum.io/',
'http://cnn.com/']
for url in urls:
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
print(soup.select('head > title')[0].text)
Output:
Welcome to Python.org
Python Forum
CNN - Breaking News, U.S., World, Weather, Entertainment & Video News
What was the point of simply quoting @snippsat's entire post?
(Jul-17-2017, 05:35 PM)ekansh Wrote: [ -> ]my question here
import urllib
import re
urls=[]
i=0
regex='<title>(.+?)</title>'
pattern=re.compile(regex)
while i<len(urls):
htmlfile=urllib.urlopen(urls[i])
a=htmlfile.read()
titles=re.findall(pattern,a)
print titles
i=i+1
It's a syntax error. If you run it, python will tell you what the problem is.
I think that you should try selenium. It's better.
from selenium import webdriver
browser = webdriver.Chrome()
browser.get("https://www.python.org/")
nav = browser.find_element_by_id("mainnav")
print(nav.text)
Check these examples:
https://likegeeks.com/python-web-scraping/