web scraping extract particular Div section - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: web scraping extract particular Div section (/thread-26714.html) |
web scraping extract particular Div section - AjayBachu - May-11-2020 In my html code I have Div section, and multiple Div sections have the same class name. <div class="_2GiuhO">Specifications</div> <div> <div class="_3Rrcbo V39ti-"> <div class="_2RngUh"> <div class="_2lzn0o">General</div> <table class="_3ENrHu"> . . <div class="_2RngUh"> <div class="_2lzn0o">Processor And Memory Features</div> <table class="_3ENrHu"> .. In above if we see <div class="_2RngUh"> is repeated, I used beautiful soup soup.find(class_="_2RngUh"), but it always give the first occurence. but I want to get this occurenace basesd on child name General, Processor And Memory Features how to provide this. RE: web scraping extract particular Div section - Larz60+ - May-11-2020 you need: results = soup.find('div', {'class': '_2RngUh'})also, place your html in python tags. Even though it's not python, it will maintain indentation. RE: web scraping extract particular Div section - AjayBachu - May-11-2020 Thanks for your reply, results = soup.find('div', {'class': '_2RngUh'}) even this is giving only the first occurrence of class. But want to fetch 2nd occurrence or 3rd occurrence based on child name(General, Processor And Memory Features) RE: web scraping extract particular Div section - Larz60+ - May-11-2020 change find to find_all, and select wanted item suppose it's the third item: results = soup.find_all('div', {'class': '_2RngUh'}) desired_result = results[2] RE: web scraping extract particular Div section - AjayBachu - May-12-2020 Thank you so much, I can get it based on index. But can we get index based on its child tag General, Processor And Memory Features...? <div class="_2GiuhO">Specifications</div> <div> <div class="_3Rrcbo V39ti-"> <div class="_2RngUh"> <div class="_2lzn0o">General</div> <table class="_3ENrHu"> . . <div class="_2RngUh"> <div class="_2lzn0o">Processor And Memory Features</div> <table class="_3ENrHu"> RE: web scraping extract particular Div section - snippsat - May-12-2020 (May-12-2020, 09:02 AM)AjayBachu Wrote: But can we get index based on its child tag General, Processor And Memory Features...? from bs4 import BeautifulSoup html = '''\ <div class="_2GiuhO">Specifications</div> <div> <div class="_3Rrcbo V39ti-"> <div class="_2RngUh"> <div class="_2lzn0o">General</div> <table class="_3ENrHu"> <div class="_2RngUh"> <div class="_2lzn0o">Processor And Memory Features</div> <table class="_3ENrHu">''' soup = BeautifulSoup(html, 'lxml') tags = soup.find_all(class_="_2RngUh") >>> t = tags[1] >>> t <div class="_2RngUh"> <div class="_2lzn0o">Processor And Memory Features</div> <table class="_3ENrHu"></table></div> >>> t.findChild() <div class="_2lzn0o">Processor And Memory Features</div> >>> t.findChild().text 'Processor And Memory Features'So this is example how you can test stuff out. There are many function/methods can use dir() to show all.A good editor or REPL will show you these option in a Autocomplete way. >>> dir(t) ['HTML_FORMATTERS', 'XML_FORMATTERS', '__bool__', '__call__', '__class__', '__contains__', '__copy__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_all_strings', '_find_all', '_find_one', '_formatter_for_name', '_is_xml', '_lastRecursiveChild', '_last_descendant', '_should_pretty_print', 'append', 'attrs', 'can_be_empty_element', 'childGenerator', 'children', 'clear', 'contents', 'decode', 'decode_contents', 'decompose', 'descendants', 'encode', 'encode_contents', 'extend', 'extract', 'fetchNextSiblings', 'fetchParents', 'fetchPrevious', 'fetchPreviousSiblings', 'find', 'findAll', 'findAllNext', 'findAllPrevious', 'findChild', 'findChildren', 'findNext', 'findNextSibling', 'findNextSiblings', 'findParent', 'findParents', 'findPrevious', 'findPreviousSibling', 'findPreviousSiblings', 'find_all', 'find_all_next', 'find_all_previous', 'find_next', 'find_next_sibling', 'find_next_siblings', 'find_parent', 'find_parents', 'find_previous', 'find_previous_sibling', 'find_previous_siblings', 'format_string', 'get', 'getText', 'get_attribute_list', 'get_text', 'has_attr', 'has_key', 'hidden', 'index', 'insert', 'insert_after', 'insert_before', 'isSelfClosing', 'is_empty_element', 'known_xml', 'name', 'namespace', 'next', 'nextGenerator', 'nextSibling', 'nextSiblingGenerator', 'next_element', 'next_elements', 'next_sibling', 'next_siblings', 'parent', 'parentGenerator', 'parents', 'parserClass', 'parser_class', 'prefix', 'preserve_whitespace_tags', 'prettify', 'previous', 'previousGenerator', 'previousSibling', 'previousSiblingGenerator', 'previous_element', 'previous_elements', 'previous_sibling', 'previous_siblings', 'recursiveChildGenerator', 'renderContents', 'replaceWith', 'replaceWithChildren', 'replace_with', 'replace_with_children', 'select', 'select_one', 'setup', 'string', 'strings', 'stripped_strings', 'text', 'unwrap', 'wrap']So would eg find_next() work >>> t.find_next() <div class="_2lzn0o">Processor And Memory Features</div> >>> t.find_next().text 'Processor And Memory Features RE: web scraping extract particular Div section - snippsat - May-12-2020 Larz60+ Wrote:you need:Don't need that @Larz60+,i do not use the dictionary call way anymore. Because you can just copy class name direct for source code and just add class_ to make it work.Example: from bs4 import BeautifulSoup html = '<div class="cities">London</div>' soup = BeautifulSoup(html, 'lxml')Usage: # Only add _ >>> tag = soup.find(class_="cities") >>> tag.text 'London' >>> # A dictionary call need more changing of what is organically is and also need a div tag >>> tag = soup.find('div', {'class': 'cities'}) >>> tag.text 'London' RE: web scraping extract particular Div section - AjayBachu - May-12-2020 Thank you so much.. I will use this. |