Parsing bs4 Resultset - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Parsing bs4 Resultset (/thread-35482.html) |
Parsing bs4 Resultset - gw1500se - Nov-08-2021 I'm having trouble understanding the intricacies of BeautifulSoup. I did a find for a specific 'select' tag using 'find(id=...)'. The returned results was the correct 'select' along with its options. Now I'm stuck on how to extract data from that result set. I want to parse out the value and text for each select but I can't find a method for doing that. Do I have to use string functions to brute force the extractions or are there bs4 methods for simplifying that? TIA. RE: Parsing bs4 Resultset - snippsat - Nov-08-2021 Post a sample of html and what want to parse out then it easier to give advice. RE: Parsing bs4 Resultset - gw1500se - Nov-08-2021 (Nov-08-2021, 04:54 PM)snippsat Wrote: Post a sample of html and what want to parse out then it easier to give advice. Thanks for the reply. <select id="TimeOfCallDropDownList" name="TimeOfCallDropDownList" tabindex="4"><option selected="selected" value="">Hour</option><option value="00">12:00 AM</option><option value="01">01:00 AM</option><option value="02">02:00 AM</option><option value="03">03:00 AM</option><option value="04">04:00 AM</option><option value="05">05:00 AM</option><option value="06">06:00 AM</option><option value="07">07:00 AM</option><option value="08">08:00 AM</option><option value="09">09:00 AM</option><option value="10">10:00 AM</option><option value="11">11:00 AM</option><option value="12">12:00 PM</option><option value="13">01:00 PM</option><option value="14">02:00 PM</option><option value="15">03:00 PM</option><option value="16">04:00 PM</option><option value="17">05:00 PM</option><option value="18">06:00 PM</option><option value="19">07:00 PM</option><option value="20">08:00 PM</option><option value="21">09:00 PM</option><option value="22">10:00 PM</option><option value="23">11:00 PM</option></select> I need to parse out the option values and text. RE: Parsing bs4 Resultset - gw1500se - Nov-08-2021 I think I figured it out unless someone has a better idea. I converted the resultset to a string and ran it through BeautifulSoup again. Now I can 'find_all' options and process the result. RE: Parsing bs4 Resultset - snippsat - Nov-09-2021 (Nov-08-2021, 07:42 PM)gw1500se Wrote: I think I figured it out unless someone has a better idea. I converted the resultset to a string and ran it through BeautifulSoup again.You shall not convert to a string,just pass html to BS then it convert to Unicode. Here a example of one way to do it. from bs4 import BeautifulSoup html = '''\ <select id="TimeOfCallDropDownList" name="TimeOfCallDropDownList" tabindex="4"> <option selected="selected" value="">Hour</option> <option value="00">12:00 AM</option> <option value="01">01:00 AM</option> <option value="02">02:00 AM</option> <option value="03">03:00 AM</option> <option value="04">04:00 AM</option> <option value="05">05:00 AM</option> <option value="06">06:00 AM</option> <option value="07">07:00 AM</option> <option value="08">08:00 AM</option> <option value="09">09:00 AM</option> <option value="10">10:00 AM</option> <option value="11">11:00 AM</option> <option value="12">12:00 PM</option> <option value="13">01:00 PM</option> <option value="14">02:00 PM</option> <option value="15">03:00 PM</option> <option value="16">04:00 PM</option> <option value="17">05:00 PM</option> <option value="18">06:00 PM</option> <option value="19">07:00 PM</option> <option value="20">08:00 PM</option> <option value="21">09:00 PM</option> <option value="22">10:00 PM</option> <option value="23">11:00 PM</option> </select>''' soup = BeautifulSoup(html, 'lxml') op_vaules = soup.select('[value]') for val in op_vaules[1:]: print(f"{val.attrs.get('value')} --> {val.text}")
|