Python Forum
Need Tip On Cleaning My BS4 Scraped Data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Need Tip On Cleaning My BS4 Scraped Data
#1
Hey guys Smile I'm having an issue cleaning and refining some scraped data.. here's a sample:

[<span data-class="timestamp">12h</span>, <span data-class="timestamp">12h</span>, <span data-class="timestamp">4d</span>, <span data-class="timestamp">2d</span>, <span data-class="timestamp">5d</span>, <span data-class="timestamp">19 Jan</span>, <span data-class="timestamp">18 Jan</span>, <span data-class="timestamp">18 Jan</span>, <span data-class="timestamp">19 Jan</span>, <span data-class="timestamp">19 Jan</span>, <span data-class="timestamp">5d</span>, <span data-class="timestamp">18 Jan</span>]
This is how I'm scraping it:

js_test5 = soup.find_all('span', {'data-class': 'timestamp'})
For some reason it saves the data as a list item..

I want my output to look like this: 12h, 12h, 4d, 2d, 5d, 19 Jan, 18 Jan, 18 Jan.. etc

I tried to use .text to pull all this data out, but it's only giving me 1 result ("12h").. I can do [4].text and it will output "5d".. which is confusing, because each span is supposed to be in quotes for it to be a separate item right?

Do I need to run a loop to pull all the results out? Or maybe my method of scraping can be improved? What's the best way for me to solve this?
Reply


Messages In This Thread
Need Tip On Cleaning My BS4 Scraped Data - by digitalmatic7 - Jan-29-2018, 05:08 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Weird characters scraped samuelbachorik 3 915 Oct-29-2023, 02:36 PM
Last Post: DeaD_EyE
  Web scraper not populating .txt with scraped data BlackHeart 5 1,512 Apr-03-2023, 05:12 PM
Last Post: snippsat
Bug Need Pointers/Advise for Cleaning up BS4 XPATH Data BrandonKastning 0 1,230 Mar-08-2022, 12:28 PM
Last Post: BrandonKastning
  Python Obstacles | Krav Maga | Wiki Scraped Content [Column Copy] BrandonKastning 4 2,219 Jan-03-2022, 06:59 AM
Last Post: BrandonKastning
  Python Obstacles | Kapap | Wiki Scraped Content [Column Nulling] BrandonKastning 2 1,726 Jan-03-2022, 04:26 AM
Last Post: BrandonKastning
  cleaning HTML pages using lxml and XPath wenkos 2 2,435 Aug-25-2021, 10:54 AM
Last Post: wenkos
  Cleaning HTML data using Jupyter Notebook jacob1986 7 4,134 Mar-05-2021, 10:44 PM
Last Post: snippsat
  Any way to remove HTML tags from scraped data? (I want text only) SeBz2020uk 1 3,461 Nov-02-2020, 08:12 PM
Last Post: Larz60+
  cant loop through scraped site matt42 3 2,427 Aug-12-2020, 06:48 AM
Last Post: ndc85430
  Normalizig scraped text wuggs 3 2,541 Jan-07-2020, 03:32 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020