How to web scrape this? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: How to web scrape this? (/thread-33795.html) |
How to web scrape this? - Pedroski55 - May-27-2021 This is not really a Python question, sorry, but I don't know where to ask. I was interested in recent posts here about web-scraping. I often look up examples of Python on geeksforgeeks.org, they have simple, clear examples. I am reading about the difference between propositional logic and predicate logic and geeksforgeeks.org has a short webpage about the difference between these two. So I thought, "I'll webscrape it and save the text!", just as practice. But, there is no .html or .php just: Quote:https://www.geeksforgeeks.org/difference-between-propositional-logic-and-predicate-logic/#:~:text=Difference%20between%20Propositional%20Logic%20and%20Predicate%20Logic:%20,scope%20%20...%20%203%20more%20rows%20 Can this be webscraped? What language is this? What kind of webpage is this? RE: How to web scrape this? - Larz60+ - May-27-2021 If you just want to save the page (from Firefox, assume something similar in your browser):
RE: How to web scrape this? - Pedroski55 - May-28-2021 Thanks, but what I'm really wondering is: what is this webpage with no document in the form of a_webpage.html or a_webpage.php What is this in place of an html document?? #:~:text=Difference%20between%20Propositional%20Logic%20and%20Predicate%20Logic:%20,scope%20%20...%20%203%20more%20rows%20 RE: How to web scrape this? - snippsat - May-28-2021 (May-27-2021, 10:24 PM)Pedroski55 Wrote: So I thought, "I'll webscrape it and save the text!", just as practice.Do you see .html or .php often as it's not common to have in a url address. So on the web dos not filename extensions matter, as web-server call .html files and map it to a serve name and browser also communicated with a name server(DNS) to translate the server name. Read more about this. So scraping it's the same way as it's just normal url address. import requests from bs4 import BeautifulSoup url = 'https://www.geeksforgeeks.org/difference-between-propositional-logic-and-predicate-logic/' response = requests.get(url) soup = BeautifulSoup(response.content, 'lxml') print(soup.select_one('div.title').text) print(soup.select_one('#post-564612 > div.text > ol:nth-child(4) > li:nth-child(1)').text)
RE: How to web scrape this? - Pedroski55 - May-28-2021 Thanks I tried it, worked great. (Don't understand the: Quote:print(soup.select_one('#post-564612 > div.text > ol:nth-child(4) > li:nth-child(1)').text) part, but I will look it up!) I thought all web documents had .html or .php as a basis. My mistake! This thread has https://python-forum.io/thread-33795.html as its basis. RE: How to web scrape this? - Larz60+ - May-28-2021 Snippsat (one who answered your question) has two simple tutorials that will answer your questions. see: web scraping part 1 web scraping part 2 RE: How to web scrape this? - snippsat - May-28-2021 (May-28-2021, 07:07 AM)Pedroski55 Wrote: Thanks I tried it, worked great. (Don't understand the:)It CSS selector can copy it from browser when in dev-tool(F12), right click over tag wanted then Copy ➡ Copy selector ,in BS two ways to call the selector .select() or .select_one() .
RE: How to web scrape this? - nilamo - May-28-2021 (May-28-2021, 07:07 AM)Pedroski55 Wrote: I thought all web documents had .html or .php as a basis. File extensions on the web are mostly all imaginary. Web servers send Content-Type headers along with whatever the document's contents are, so browsers know what to do with it (parse it for html, or display it for images, or save it for pdfs, etc). |