Python Forum
Extracting Elements From A Website List
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extracting Elements From A Website List
#1
Hi all,

I'm practicing some webscraping and have come up to an obstacle that has me stuck.

I'm trying to use code that will go through a few different pages of the same website and extract certain text from a list that is visible in each page. The challenge is, that the site list can have a different number of elements, so I'm not how to handle that if an element is not available.

Let me demo this to make it clearer...

Example html of one page in the website:

<ul class="nb-type-md nb-list-undecorated undefined">
	<li class=""><span><div class="nb-icon-small nb-inline-block nb-text-gray-200 nb-mr-2xs nb-align-middle"></div>Blue</span></li>
	<li class=""><span><div class="nb-icon-small nb-inline-block nb-text-gray-200 nb-mr-2xs nb-align-middle"></div>Designed in China</span></li>
	<li class=""><span><div class="nb-icon-small nb-inline-block nb-text-gray-200 nb-mr-2xs nb-align-middle"></div>http://www.mysupersite.com</span></li>
</ul> 
And here's an example of another page in the same website:

<ul class="nb-type-md nb-list-undecorated undefined">
	<li class=""><span><div class="nb-icon-small nb-inline-block nb-text-gray-200 nb-mr-2xs nb-align-middle"></div>Green</span></li>
	<li class=""><span><div class="nb-icon-small nb-inline-block nb-text-gray-200 nb-mr-2xs nb-align-middle"></div>Designed in England</span></li>
	<li class=""><span><div class="nb-icon-small nb-inline-block nb-text-gray-200 nb-mr-2xs nb-align-middle"></div>Shadow Chrome Painted</span></li>
	<li class=""><span><div class="nb-icon-small nb-inline-block nb-text-gray-200 nb-mr-2xs nb-align-middle"></div>http://www.mydifferentsite.com</span></li>
</ul>
As you can see, the first page has 3 items, whilst the second page has 4.

So if for example I'm trying to extract the url from these two pages (ie- http://www.mysupersite.com and http://www.mydifferentsite.com), how would I go about doing that?

My latest trial:
    for wa in lists.find_all('li'):
        if wa[3] is KeyError:
            wa[2]
        else:
            wa[3]
I get:
Error:
Traceback (most recent call last): File "C:/Users/testscrape.py", line 28, in <module> if wa[3] is KeyError: File "C:\Users\lib\site-packages\bs4\element.py", line 1406, in __getitem__ return self.attrs[key] KeyError: 3
I thought an IF statement would be what works- something like: IF wa[3] doesn't exist, then use wa[2], else wa[3]- but I don't know how to translate that into code Undecided

Could someone please enlighten me how to handle these sort's of optional indexes?

Thanks a lot.
Reply


Messages In This Thread
Extracting Elements From A Website List - by knight2000 - Jul-19-2021, 08:08 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  unable to remove all elements from list based on a condition sg_python 3 424 Jan-27-2024, 04:03 PM
Last Post: deanhystad
Question mypy unable to analyse types of tuple elements in a list comprehension tomciodev 1 470 Oct-17-2023, 09:46 AM
Last Post: tomciodev
  Checking if a string contains all or any elements of a list k1llcod3 1 1,094 Jan-29-2023, 04:34 AM
Last Post: deanhystad
  How to change the datatype of list elements? mHosseinDS86 9 1,954 Aug-24-2022, 05:26 PM
Last Post: deanhystad
  ValueError: Length mismatch: Expected axis has 8 elements, new values have 1 elements ilknurg 1 5,111 May-17-2022, 11:38 AM
Last Post: Larz60+
  Why am I getting list elements < 0 ? Mark17 8 3,117 Aug-26-2021, 09:31 AM
Last Post: naughtyCat
  Looping through nested elements and updating the original list Alex_James 3 2,117 Aug-19-2021, 12:05 PM
Last Post: Alex_James
  Make Groups with the List Elements quest 2 1,964 Jul-11-2021, 09:58 AM
Last Post: perfringo
  I cannot delete and the elements from the list quest 4 2,966 May-11-2021, 12:01 PM
Last Post: perfringo
  List of lists - merge sublists with common elements medatib531 1 3,392 May-09-2021, 07:49 AM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020