Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parsing bs4 Resultset
#1
I'm having trouble understanding the intricacies of BeautifulSoup. I did a find for a specific 'select' tag using 'find(id=...)'. The returned results was the correct 'select' along with its options. Now I'm stuck on how to extract data from that result set. I want to parse out the value and text for each select but I can't find a method for doing that. Do I have to use string functions to brute force the extractions or are there bs4 methods for simplifying that? TIA.
Reply
#2
Post a sample of html and what want to parse out then it easier to give advice.
Reply
#3
(Nov-08-2021, 04:54 PM)snippsat Wrote: Post a sample of html and what want to parse out then it easier to give advice.

Thanks for the reply.

<select id="TimeOfCallDropDownList" name="TimeOfCallDropDownList" tabindex="4"><option selected="selected" value="">Hour</option><option value="00">12:00 AM</option><option value="01">01:00 AM</option><option value="02">02:00 AM</option><option value="03">03:00 AM</option><option value="04">04:00 AM</option><option value="05">05:00 AM</option><option value="06">06:00 AM</option><option value="07">07:00 AM</option><option value="08">08:00 AM</option><option value="09">09:00 AM</option><option value="10">10:00 AM</option><option value="11">11:00 AM</option><option value="12">12:00 PM</option><option value="13">01:00 PM</option><option value="14">02:00 PM</option><option value="15">03:00 PM</option><option value="16">04:00 PM</option><option value="17">05:00 PM</option><option value="18">06:00 PM</option><option value="19">07:00 PM</option><option value="20">08:00 PM</option><option value="21">09:00 PM</option><option value="22">10:00 PM</option><option value="23">11:00 PM</option></select>

I need to parse out the option values and text.
Reply
#4
I think I figured it out unless someone has a better idea. I converted the resultset to a string and ran it through BeautifulSoup again. Now I can 'find_all' options and process the result.
Reply
#5
(Nov-08-2021, 07:42 PM)gw1500se Wrote: I think I figured it out unless someone has a better idea. I converted the resultset to a string and ran it through BeautifulSoup again.
You shall not convert to a string,just pass html to BS then it convert to Unicode.
Here a example of one way to do it.
from bs4 import BeautifulSoup

html = '''\
<select id="TimeOfCallDropDownList" name="TimeOfCallDropDownList" tabindex="4">
  <option selected="selected" value="">Hour</option>
  <option value="00">12:00 AM</option>
  <option value="01">01:00 AM</option>
  <option value="02">02:00 AM</option>
  <option value="03">03:00 AM</option>
  <option value="04">04:00 AM</option>
  <option value="05">05:00 AM</option>
  <option value="06">06:00 AM</option>
  <option value="07">07:00 AM</option>
  <option value="08">08:00 AM</option>
  <option value="09">09:00 AM</option>
  <option value="10">10:00 AM</option>
  <option value="11">11:00 AM</option>
  <option value="12">12:00 PM</option>
  <option value="13">01:00 PM</option>
  <option value="14">02:00 PM</option>
  <option value="15">03:00 PM</option>
  <option value="16">04:00 PM</option>
  <option value="17">05:00 PM</option>
  <option value="18">06:00 PM</option>
  <option value="19">07:00 PM</option>
  <option value="20">08:00 PM</option>
  <option value="21">09:00 PM</option>
  <option value="22">10:00 PM</option>
  <option value="23">11:00 PM</option>
</select>'''

soup = BeautifulSoup(html, 'lxml')
op_vaules = soup.select('[value]')
for val in op_vaules[1:]:
    print(f"{val.attrs.get('value')} --> {val.text}")
Output:
00 --> 12:00 AM 01 --> 01:00 AM 02 --> 02:00 AM 03 --> 03:00 AM 04 --> 04:00 AM 05 --> 05:00 AM 06 --> 06:00 AM 07 --> 07:00 AM 08 --> 08:00 AM 09 --> 09:00 AM 10 --> 10:00 AM 11 --> 11:00 AM 12 --> 12:00 PM 13 --> 01:00 PM 14 --> 02:00 PM 15 --> 03:00 PM 16 --> 04:00 PM 17 --> 05:00 PM 18 --> 06:00 PM 19 --> 07:00 PM 20 --> 08:00 PM 21 --> 09:00 PM 22 --> 10:00 PM 23 --> 11:00 PM
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  AttributeError: ResultSet object has no attribute 'get_text' KatMac 1 1,586 May-07-2021, 05:32 PM
Last Post: snippsat
  Python and MYSQL Resultset vj78 2 1,034 Apr-02-2021, 12:41 AM
Last Post: vj78
  TypeError: must be str, not ResultSet HiImNew 3 5,870 Feb-15-2018, 07:48 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020