Python Forum
What's a good practice project for learning BeautifulSoup4, which has a real use case - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: What's a good practice project for learning BeautifulSoup4, which has a real use case (/thread-7854.html)



What's a good practice project for learning BeautifulSoup4, which has a real use case - league55 - Jan-27-2018

I want to get good at using BS4 in my scripts and want to do it by working on scripts that have some kind of real use. And I want to start simple.


RE: What's a good practice project for learning BeautifulSoup4, which has a real use case - wavic - Jan-27-2018

Chose what you want to get from a website and start coding. Start with something simple. Get the webpage title, then the header, after that all the paragraphs, all the links. Without a goal, it's hard to learn. Just reading the documentation is not gonna happen. If you have some questions or face obstacles feel free to share them here. We are willing to help.

For example, my first web scrapping script was to gather emails from a website for a bunch of people because there was not any document containing that info.

There are easy to follow tutorials here:
https://python-forum.io/Thread-Web-Scraping-part-1
https://python-forum.io/Thread-Web-scraping-part-2

You may start with them. Thanks to @snippsat.


RE: What's a good practice project for learning BeautifulSoup4, which has a real use case - league55 - Jan-27-2018

I just got a project idea inadvertently.

I found out that I can download my entire Google account history, or just parts of it. The use history is in html files inside subfolders.

These html files display just fine in a browser, but it's impossible to do anything useful with the data that way.

I looked at one of the files in a text editor and the html is HIDEOUS. I want to develop an automated way to extract the useful data.

But first I'll have a look at those tutorials you linked to see what they contain, that can help me.