web scraping extract particular Div section

web scraping extract particular Div section - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: web scraping extract particular Div section (/thread-26714.html)

web scraping extract particular Div section - AjayBachu - May-11-2020

In my html code I have Div section, and multiple Div sections have the same class name.

<div class="_2GiuhO">Specifications</div>
<div>
<div class="_3Rrcbo V39ti-">
<div class="_2RngUh">
<div class="_2lzn0o">General</div>
<table class="_3ENrHu">
.
.
<div class="_2RngUh">
<div class="_2lzn0o">Processor And Memory Features</div>
<table class="_3ENrHu">
..
In above if we see <div class="_2RngUh"> is repeated,

I used beautiful soup soup.find(class_="_2RngUh"), but it always give the first occurence.
but I want to get this occurenace basesd on child name General, Processor And Memory Features how to provide this.

RE: web scraping extract particular Div section - Larz60+ - May-11-2020

you need:

results = soup.find('div', {'class': '_2RngUh'})

also, place your html in python tags. Even though it's not python, it will maintain indentation.

RE: web scraping extract particular Div section - AjayBachu - May-11-2020

Thanks for your reply,

results = soup.find('div', {'class': '_2RngUh'})
even this is giving only the first occurrence of class.

But want to fetch 2nd occurrence or 3rd occurrence based on child name(General, Processor And Memory Features)

RE: web scraping extract particular Div section - Larz60+ - May-11-2020

change find to find_all, and select wanted item
suppose it's the third item:

results = soup.find_all('div', {'class': '_2RngUh'})
desired_result = results[2]

RE: web scraping extract particular Div section - AjayBachu - May-12-2020

Thank you so much, I can get it based on index.
But can we get index based on its child tag General, Processor And Memory Features...?

<div class="_2GiuhO">Specifications</div>
<div>
<div class="_3Rrcbo V39ti-">
<div class="_2RngUh">
<div class="_2lzn0o">General</div>
<table class="_3ENrHu">
.
.
<div class="_2RngUh">
<div class="_2lzn0o">Processor And Memory Features</div>
<table class="_3ENrHu">

RE: web scraping extract particular Div section - snippsat - May-12-2020

(May-12-2020, 09:02 AM)AjayBachu Wrote: But can we get index based on its child tag General, Processor And Memory Features...?

from bs4 import BeautifulSoup

html = '''\
<div class="_2GiuhO">Specifications</div>
<div>
<div class="_3Rrcbo V39ti-">
<div class="_2RngUh">
<div class="_2lzn0o">General</div>
<table class="_3ENrHu">
<div class="_2RngUh">
<div class="_2lzn0o">Processor And Memory Features</div>
<table class="_3ENrHu">'''

soup = BeautifulSoup(html, 'lxml')

tags = soup.find_all(class_="_2RngUh")
>>> t = tags[1]
>>> t
<div class="_2RngUh">
<div class="_2lzn0o">Processor And Memory Features</div>
<table class="_3ENrHu"></table></div>

>>> t.findChild()
<div class="_2lzn0o">Processor And Memory Features</div>
>>> t.findChild().text
'Processor And Memory Features'

So this is example how you can test stuff out.
There are many function/methods can use dir() to show all.
A good editor or REPL will show you these option in a Autocomplete way.

>>> dir(t)
['HTML_FORMATTERS',
 'XML_FORMATTERS',
 '__bool__',
 '__call__',
 '__class__',
 '__contains__',
 '__copy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__unicode__',
 '__weakref__',
 '_all_strings',
 '_find_all',
 '_find_one',
 '_formatter_for_name',
 '_is_xml',
 '_lastRecursiveChild',
 '_last_descendant',
 '_should_pretty_print',
 'append',
 'attrs',
 'can_be_empty_element',
 'childGenerator',
 'children',
 'clear',
 'contents',
 'decode',
 'decode_contents',
 'decompose',
 'descendants',
 'encode',
 'encode_contents',
 'extend',
 'extract',
 'fetchNextSiblings',
 'fetchParents',
 'fetchPrevious',
 'fetchPreviousSiblings',
 'find',
 'findAll',
 'findAllNext',
 'findAllPrevious',
 'findChild',
 'findChildren',
 'findNext',
 'findNextSibling',
 'findNextSiblings',
 'findParent',
 'findParents',
 'findPrevious',
 'findPreviousSibling',
 'findPreviousSiblings',
 'find_all',
 'find_all_next',
 'find_all_previous',
 'find_next',
 'find_next_sibling',
 'find_next_siblings',
 'find_parent',
 'find_parents',
 'find_previous',
 'find_previous_sibling',
 'find_previous_siblings',
 'format_string',
 'get',
 'getText',
 'get_attribute_list',
 'get_text',
 'has_attr',
 'has_key',
 'hidden',
 'index',
 'insert',
 'insert_after',
 'insert_before',
 'isSelfClosing',
 'is_empty_element',
 'known_xml',
 'name',
 'namespace',
 'next',
 'nextGenerator',
 'nextSibling',
 'nextSiblingGenerator',
 'next_element',
 'next_elements',
 'next_sibling',
 'next_siblings',
 'parent',
 'parentGenerator',
 'parents',
 'parserClass',
 'parser_class',
 'prefix',
 'preserve_whitespace_tags',
 'prettify',
 'previous',
 'previousGenerator',
 'previousSibling',
 'previousSiblingGenerator',
 'previous_element',
 'previous_elements',
 'previous_sibling',
 'previous_siblings',
 'recursiveChildGenerator',
 'renderContents',
 'replaceWith',
 'replaceWithChildren',
 'replace_with',
 'replace_with_children',
 'select',
 'select_one',
 'setup',
 'string',
 'strings',
 'stripped_strings',
 'text',
 'unwrap',
 'wrap']

So would eg find_next() work Think

>>> t.find_next()
<div class="_2lzn0o">Processor And Memory Features</div>
>>> t.find_next().text
'Processor And Memory Features

RE: web scraping extract particular Div section - snippsat - May-12-2020

Larz60+ Wrote:you need:
results = soup.find('div', {'class': '_2RngUh'})

Don't need that @Larz60+,i do not use the dictionary call way anymore.
Because you can just copy class name direct for source code and just add class_ to make it work.
Example:

from bs4 import BeautifulSoup

html = '<div class="cities">London</div>'
soup = BeautifulSoup(html, 'lxml')

Usage:

# Only add _
>>> tag = soup.find(class_="cities")
>>> tag.text
'London'

>>> # A dictionary call need more changing of what is organically is and also need a div tag 
>>> tag = soup.find('div', {'class': 'cities'})
>>> tag.text
'London'

RE: web scraping extract particular Div section - AjayBachu - May-12-2020

Thank you so much.. I will use this.