Python Forum

Full Version: Split a long string into other strings with no delimiters/characters
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
So I am web scraping an eBay sales history web-page using BeautifulSoup. Here's the webpage:
[Image: 1W1GRfj.png]

An example of the results I get are:

Output:
9***o( 62) Size: 30W: 30cm x 50cm (12" x 20")£22.99112-Aug-19 14:03:10 BST f***e( 1419) Size: 10W: 15cm x 25cm (6" x 10")£15.99111-Aug-19 10:03:05 BST
Depending on the item ID I put in, it will be slightly different. For example, another result:

Output:
8***t( 291) Fluval External Filter: Fluval 307 External Filter£129.99127-Jul-19 14:02:54 BST _***2( 1401) Fluval External Filter: Fluval 407 External Filter£177.99126-Jul-19 23:54:21 BST
I would like to split these strings so it's like this:

Output:
"9***o( 62)", "Size: 30W: 30cm x 50cm (12" x 20")", "£22.99", "1", "12-Aug-19", "14:03:10 BST"
or for the second example:
Output:
"_***2( 1401)", "Fluval External Filter: Fluval 407 External Filter", "£177.99", "1", "26-Jul-19", "23:54:21 BST"
I have no idea where to start here, but I would imagine splitting the price part 2 characters after the "." maybe? Any help here would be greatly appreciated as I am lost!
Well, you want everything up to an including the first ')', then everything from there to the first pound sign. The the first pound sign to two digits after the decimal, then one character, then split the rest at the space. You can find where those characters are with the index method of the string, and then just use slicing to split it apart.
Was that whole string between the same tags in the original web?
(Nov-15-2019, 02:17 PM)baquerik Wrote: [ -> ]Was that whole string between the same tags in the original web?

It's all one string between the same tags, yes.

I think I might be able to figure this out - but my question is, can I split a string 2 characters after another character? For example, can I split a string 2 characters after a . ?
dot = text.index('.')
two_after = text[:dot + 2]
tail = text[dot + 2:]