I'm having trouble understanding the intricacies of BeautifulSoup. I did a find for a specific 'select' tag using 'find(id=...)'. The returned results was the correct 'select' along with its options. Now I'm stuck on how to extract data from that result set. I want to parse out the value and text for each select but I can't find a method for doing that. Do I have to use string functions to brute force the extractions or are there bs4 methods for simplifying that? TIA.
Post a sample of html and what want to parse out then it easier to give advice.
(Nov-08-2021, 04:54 PM)snippsat Wrote: [ -> ]Post a sample of html and what want to parse out then it easier to give advice.
Thanks for the reply.
<select id="TimeOfCallDropDownList" name="TimeOfCallDropDownList" tabindex="4"><option selected="selected" value="">Hour</option><option value="00">12:00 AM</option><option value="01">01:00 AM</option><option value="02">02:00 AM</option><option value="03">03:00 AM</option><option value="04">04:00 AM</option><option value="05">05:00 AM</option><option value="06">06:00 AM</option><option value="07">07:00 AM</option><option value="08">08:00 AM</option><option value="09">09:00 AM</option><option value="10">10:00 AM</option><option value="11">11:00 AM</option><option value="12">12:00 PM</option><option value="13">01:00 PM</option><option value="14">02:00 PM</option><option value="15">03:00 PM</option><option value="16">04:00 PM</option><option value="17">05:00 PM</option><option value="18">06:00 PM</option><option value="19">07:00 PM</option><option value="20">08:00 PM</option><option value="21">09:00 PM</option><option value="22">10:00 PM</option><option value="23">11:00 PM</option></select>
I need to parse out the option values and text.
I think I figured it out unless someone has a better idea. I converted the resultset to a string and ran it through BeautifulSoup again. Now I can 'find_all' options and process the result.
(Nov-08-2021, 07:42 PM)gw1500se Wrote: [ -> ]I think I figured it out unless someone has a better idea. I converted the resultset to a string and ran it through BeautifulSoup again.
You shall not convert to a string,just pass html to BS then it convert to Unicode.
Here a example of one way to do it.
from bs4 import BeautifulSoup
html = '''\
<select id="TimeOfCallDropDownList" name="TimeOfCallDropDownList" tabindex="4">
<option selected="selected" value="">Hour</option>
<option value="00">12:00 AM</option>
<option value="01">01:00 AM</option>
<option value="02">02:00 AM</option>
<option value="03">03:00 AM</option>
<option value="04">04:00 AM</option>
<option value="05">05:00 AM</option>
<option value="06">06:00 AM</option>
<option value="07">07:00 AM</option>
<option value="08">08:00 AM</option>
<option value="09">09:00 AM</option>
<option value="10">10:00 AM</option>
<option value="11">11:00 AM</option>
<option value="12">12:00 PM</option>
<option value="13">01:00 PM</option>
<option value="14">02:00 PM</option>
<option value="15">03:00 PM</option>
<option value="16">04:00 PM</option>
<option value="17">05:00 PM</option>
<option value="18">06:00 PM</option>
<option value="19">07:00 PM</option>
<option value="20">08:00 PM</option>
<option value="21">09:00 PM</option>
<option value="22">10:00 PM</option>
<option value="23">11:00 PM</option>
</select>'''
soup = BeautifulSoup(html, 'lxml')
op_vaules = soup.select('[value]')
for val in op_vaules[1:]:
print(f"{val.attrs.get('value')} --> {val.text}")
Output:
00 --> 12:00 AM
01 --> 01:00 AM
02 --> 02:00 AM
03 --> 03:00 AM
04 --> 04:00 AM
05 --> 05:00 AM
06 --> 06:00 AM
07 --> 07:00 AM
08 --> 08:00 AM
09 --> 09:00 AM
10 --> 10:00 AM
11 --> 11:00 AM
12 --> 12:00 PM
13 --> 01:00 PM
14 --> 02:00 PM
15 --> 03:00 PM
16 --> 04:00 PM
17 --> 05:00 PM
18 --> 06:00 PM
19 --> 07:00 PM
20 --> 08:00 PM
21 --> 09:00 PM
22 --> 10:00 PM
23 --> 11:00 PM