Python Forum

Full Version: Split string between two different delimiters, with exceptions
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I couldn't really think of a good title for this, so, sorry!

Say I have some data like so:
:1:123456:2:name:42:3:30:4::5:somerandomdata:9:8
The important data is held between groups like :1:. Each of those groups has a different number from the next. I need to split the string between all of those groups so I can access that data. But due to some issues with the data I'll explain, it's not as easy as just splitting.
In the above data I have tried to include all the edge cases:
- The numbers of the groups are not in any specific order. Generally they are ascending but some will be out of order
- The groups numbers are between 1 and 2 characters
- Some groups will be empty like :4::5:
- One groups has a time which has a colon in it: :42:3:30:4: (the time being 3:30). This will mess up most already existing split functions.
In the data above, all actual data between the groups is hard coded. In the PHP file this comes from, most of that data is from many variables which explain what that data is. However some of the data is just constant values like 1 and I don't know what they do. In the data above I made :9:8 the "unknown constant".

My current approach has a class full of values like this:
class DataValues:
    _ID = 1
    NAME = 2
    TIME = 42
    SOMETHING = 4
    DATA = 5
    CONSTANT = 9
Each variable is in the order like in the data. To split I do this:
s = []
_vars = [getattr(DataValues, v) for v in vars(DataValues) if not v.startswith("__")] #loop through the variables in the class
for i in range(len(_vars) - 1):
	split = re.search("(:{}:)(.*)(:{}:)".format(_vars[i], _vars[i+1]), self.data)
	split = "" if split is None else split.group(2)
	s.append(split)
This works for the most part. It has 1 issue where the regex just doesn't split the string like so :2:Sidestep:3:VXBkYXRlOiBhZGRlZCB (It should have split between 2 and 3).
The code also relies on the class having every single value in the exact order of the data. If anything was changed in the data it would break. It also involves me having these "constant" values which are unknown and so it makes the code messy because I have to create variables like UKNOWN_1 = 8.

Could anyone help me out on this?
I have never seen split used like this. And surprised it works at all.

Are you parsing the class or the raw data?

I would start with something simple like:
foo=rawdata.split(':')
And join or delete the list slices into appropriate vars.
Here it splits as:
['', '1', '123456', '2', 'name', '42', '3', '30', '4', '', '5', 'somerandomdata', '9', '8']
Time can be assembled as
time = foo[5]+':'+foo[6]+':'+foo[7]+':'+foo[8] for example.
(Aug-24-2020, 08:05 AM)millpond Wrote: [ -> ]I have never seen split used like this. And surprised it works at all.

Are you parsing the class or the raw data?

I would start with something simple like:
foo=rawdata.split(':')
And join or delete the list slices into appropriate vars.
Here it splits as:
['', '1', '123456', '2', 'name', '42', '3', '30', '4', '', '5', 'somerandomdata', '9', '8']
Time can be assembled as
time = foo[5]+':'+foo[6]+':'+foo[7]+':'+foo[8] for example.
I am parsing the raw data from a string.
Your method I tried at first (and is what other people seem to). It would work, but there is a floor in that the actual data has about 35 values which would mean you would have foo[0] all the way up to foo[35] and that just makes the code very messy.

When I mentioned PHP, here's what I was talking about:

$response = "1:".$result["levelID"].":2:".$result["levelName"].":3:".$desc.":4:".$levelstring.":5:".$result["levelVersion"].":6:".$result["userID"].":8:10:9:".$result["starDifficulty"].":10:".$result["downloads"].":11:1:12:".$result["audioTrack"].":13:".$result["gameVersion"].":14:".$result["likes"].":17:".$result["starDemon"].":43:".$result["starDemonDiff"].":25:".$result["starAuto"].":18:".$result["starStars"].":19:".$result["starFeatured"].":42:".$result["starEpic"].":45:".$result["objects"].":15:".$result["levelLength"].":30:".$result["original"].":31:1:28:".$uploadDate. ":29:".$updateDate. ":35:".$result["songID"].":36:".$result["extraString"].":37:".$result["coins"].":38:".$result["starCoins"].":39:".$result["requestedStars"].":46:1:47:2:48:1:40:".$result["isLDM"].":27:$xorPass"
It's just a very long string of data.