Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Basic Syntax/HTML Scrape Questions
#1
Hello,

I imagine this has been answered somewhere else; however I've searched and watched numerous tutorials and still haven't found an answer, so figured I'd just ask.

I'll try to explain this process simply. So I essentially need to do some HTML 'scraping?' , this is for a poker website so I'm hesitant to connect directly to the website (also may not be able to.) Instead I've used Google Inspect to copy the HTML from a popup table from the website and I'm looking to write a function that grabs and prints all the names of players at the table. I've noticed they are all under a div tag "player-name" so I'm thinking that this will make it easier.
So the question is...
I have the html copied into one file in PyCharm
I have a python file opened up in Pycharm
I would like a python function which finds and prints all the text in the HTML under div tags "player-name"
So that I have a list, line by line, of all the player names at the table.

I imagine there are some different ways to go about what I'm trying to do here, so I would appreciate first a function which can reference the html and print this text for me (from a separate file) and then, possibly some different solutions to this issue.

Thank you very much for any effort, much appreciate. I'm very novice at this, so I will be happy to clarify anything that I can if necessary.
Reply
#2
Utilize the browser's inspection. Find the location in the page where it lists the player names that you are looking to grab. Copy them and Right click (Firefox -> Inspect Element) (Chrome->Inpect) This will allow you to navigate around the HTML and make it easier to find the proper tags to look for. This is much quicker and reliable than just looking through the entire HTML file. It also lets you compare and contrast tags and view which one is selected. This will show you what exactly you need to parse.

a div tag doesnt have a name, so i am assuming the "player-name" is actually a class name? If that is the case you could search for div tags with class names "player-name"

soup.find_all('div', {'class':'player-name'}) but most likely will not be exactly that as each website may need it to be done a little differently depending on the html structure. We would have to see the html to give a better description
Recommended Tutorials:
Reply
#3
Ok, so even more basic question. How do I import Beautiful Soup into PyCharm?

I'm getting this error right now.

NameError: name 'soup' is not defined
Reply
#4
soup is just the object created from BeautifulSoup() class. You should read the docs.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Recommended Tutorials:
Reply
#5
(Sep-06-2018, 11:00 AM)metulburr Wrote: Ok, so even more basic question. How do I import Beautiful Soup into PyCharm?
Can also look at Web-Scraping part-1

You should make sure that all work from command line.
Let say you use Windows Python 3.6/3.7 and pip installation under Windows
Test that python and pip work fromcommand line.
C:\
λ python -V
Python 3.7.0
 
C:\
λ pip -V
pip 10.0.1 from c:\python37\lib\site-packages\pip (python 3.7)
Now you select this version in configuring Python Interpreter in PyCharm.

PyCharm has a own interface for installing with pip,
but you should get comfortable using command line cmd or better cmder.

When pip point to 3.7 then will pip install beautifulsoup4 install to 3.7.
If chosen Python 3.7 Interpreter in PyCharm then Beautifulsoup will work there.
Can also look VS Code tutorial here.
Reply
#6
Wanted to say thank you to everyone. As of this morning I've got a suitable code working.


Here it is....

from bs4 import BeautifulSoup

with open('apple.html') as html_file:
soup = BeautifulSoup(html_file, 'lxml')

for players in soup.find_all('div', class_='player-name'):
hope = players.text
print(hope)

As I go through the learning process (I do apologize, to give everyone an idea, I downloaded PyCharm on Monday and hadn't written or really seen a single line of python or really any code until Tuesday) I want to chronicle some of this issues I've had.
1. So with this code I had a pretty bad hang up trying to use Python rather than the Command line to pip install things. Made a ton of progress once I could pip install bs4 and lxml
2. Issue with the syntax relating to how to import a file into python. (and still a little foggy on how to navigate the path to various files, currently I'm using Shift+Right Click to open PowerShell in the file where 'apple.html' is located. Would love to be able to simply open a command prompt from anywhere and have it specifically refer to this file.
3. Issue with how to pull text only from inside the div class 'player-name' .... the .text portion alluded me for what seemed like quite a while.... Still one issue here, currently the List prints like so:
'Player 1 Name

Player 2 Name

Player 3 Name'
Would like to have it print like this:
'PLayer 1 Name
Player 2 Name
Player 3 Name'

4. I know this one is stupid, but I NNED to remember the ':' at the end of the lines starting with 'with' or 'for' (or 'if' or 'elif') Was stumped there for a little while.

Don't want this to be a novel. But my next goals for this code are to further automate, so that I don't have to save the html into a file by manually right clicking the browser and using Inspect to save the html.
So looking to add lines before what I currently have that will lock onto the active browser, open google inspect, save the html to a default file, and then execute current code....
Later I'd like to add lines below current code to place the player list into a specified Column in an Open Office Sheet.

Today doing more tutorials, will update on progress.

***In reference to point 2 I am currently getting this error message when I run my code in a PyCharm window
FileNotFoundError: [Errno 2] No such file or directory: 'apple.html'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to scrape data from HTML with no identifiers pythonpaul32 2 795 Dec-02-2023, 03:42 AM
Last Post: pythonpaul32
Lightbulb Python Obstacles | Kung-Fu | Full File HTML Document Scrape and Store it in MariaDB BrandonKastning 5 2,817 Dec-29-2021, 02:26 AM
Last Post: BrandonKastning
  Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB BrandonKastning 8 3,090 Nov-22-2021, 01:38 AM
Last Post: BrandonKastning
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,529 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Scrape for html based on url string and output into csv dana 13 5,326 Jan-13-2021, 03:52 PM
Last Post: snippsat
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,328 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,087 Mar-13-2020, 07:59 PM
Last Post: alkaline3

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020