html parser - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: html parser (/thread-9002.html) |
html parser - tjnichols - Mar-16-2018 Hello - I'm working on the book "Web Scraping with Python" b Ryan Mitchell 2015. I finally just decided to pick one and jump in so here I am. I've got the basics (I think). There is still much to learn I'm sure. Here's my current issue... from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page1.html") bsOb = BeautifulSoup(html.read()) print(bsObj.h1) This is the error I get... Warning (from warnings module): File "C:\Users\Admin\AppData\Local\Programs\Python\Python36\lib\site-packages\bs4\__init__.py", line 181 markup_type=markup_type)) UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 1 of the file <string>. To get rid of this warning, change code that looks like this: BeautifulSoup(YOUR_MARKUP}) to this: BeautifulSoup(YOUR_MARKUP, "html.parser") Traceback (most recent call last): File "C:\Python\Web Scraping pg 8.py", line 5, in <module> print(bsObj.h1) NameError: name 'bsObj' is not defined from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("http://www.pythonscraping.com/pages/page1.html") bsOb = BeautifulSoup(html.read, html.parser) print(bsObj.h1) This is the error I get... Traceback (most recent call last): File "C:\Python\Web_Scraping_pg_8.py", line 4, in <module> bsOb = BeautifulSoup(html.read, html.parser) AttributeError: 'HTTPResponse' object has no attribute 'parser' I think the reason I'm having the issue is because of the age of the book. Any help I can get would be most appreciated! Thank you!! RE: html parser - snippsat - Mar-16-2018 (Mar-16-2018, 06:13 PM)tjnichols Wrote: I think the reason I'm having the issue is because of the age of the book. Any help I can get would be most appreciated!Look at this more updated Web-Scraping part-1. RE: html parser - nilamo - Mar-16-2018 (Mar-16-2018, 06:13 PM)tjnichols Wrote: bsOb = BeautifulSoup(html.read()) You define a variable named bsOb , but try to use one named bsObj . Those are not the same thing.(Mar-16-2018, 06:13 PM)tjnichols Wrote: BeautifulSoup(YOUR_MARKUP, "html.parser")The message is very literal. html.parser isn't a thing that exists anywhere, but instead the string "html.parser" is a string that always exists.
RE: html parser - buran - Mar-16-2018 note the quotes around "html.parser"
RE: html parser - metulburr - Mar-16-2018 you should also use the requests module to get the html RE: html parser - tjnichols - Mar-17-2018 nilamo [b' Wrote: [/b]347' dateline='1521228699'] I truly appreciate your time and patience with me and your ability to break things down so I can understand them! Thank you! (Mar-16-2018, 07:38 PM)metulburr Wrote: you should also use the requests module to get the htmlCan you tell me how using the requests module helps me the html? I appreciate your help! RE: html parser - snippsat - Mar-17-2018 (Mar-17-2018, 05:08 PM)tjnichols Wrote: Can you tell me how using the requests module helps me the html? I appreciate your help!Because it's a better and easier to use than urllib in all parts,eg you get correct encoding back and security is up to date. Your script using Requests,if you look at link i gave you see use of Requests with BeautifulSoup and lxml. import requests from bs4 import BeautifulSoup url = 'http://www.pythonscraping.com/pages/page1.html' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') print(soup.find('title').text) print(soup.find('h1').text)
RE: html parser - tjnichols - Mar-17-2018 (Mar-17-2018, 05:55 PM)snippsat Wrote: [quote="tjnichols" pid="42398" dateline="1521306539"]Can you tell me how using the requests module helps me the html? I appreciate your help!Because it's a better and easier to use than urllib in all parts,eg you get correct encoding back and security is up to date. Your script using Requests,if you look at link i gave you see use of Requests with BeautifulSoup and lxml. import requests from bs4 import BeautifulSoup url = 'http://www.pythonscraping.com/pages/page1.html' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') print(soup.find('title').text) print(soup.find('h1').text) [/quoteThank you! That makes sense! Let me try itI RE: html parser - tjnichols - Mar-17-2018 (Mar-17-2018, 06:28 PM)tjnichols Wrote:(Mar-17-2018, 05:55 PM)snippsat Wrote: [quote="tjnichols" pid="42398" dateline="1521306539"]Can you tell me how using the requests module helps me the html? I appreciate your help!Because it's a better and easier to use than urllib in all parts,eg you get correct encoding back and security is up to date. Your script using Requests,if you look at link i gave you see use of Requests with BeautifulSoup and lxml.import requests from bs4 import BeautifulSoup url = 'http://www.pythonscraping.com/pages/page1.html' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') print(soup.find('title').text) print(soup.find('h1').text)[/quote Thank you! That makes sense! Let me try itI Hey snippsat - I tried your code. Here is what I got... import request from bs4 import BeautifulSoup url = 'http://www.pythonscraping.com/pages/page1.html' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') print(soup.find('title').text) print(soup.find('h1').text) The error... SyntaxError: multiple statements found while compiling a single statement I understand things may be different as in more secure etc, what I need to understand is why I am having the issues with what I've done. I appreciate your help and I would like to understand your way of doing things like you've shown above. Can you give me a link on the "import requests" so I can read up on that? Also, can you point me to where I can find more information on the "urllib" you talked about? Thank you! Tonya RE: html parser - tjnichols - Mar-17-2018 My apologies snippsat! I should have looked at the links sooner. I appreciate your help! Thank you! |