Dec-09-2018, 09:27 AM
(Dec-05-2018, 12:32 PM)metulburr Wrote: You could do it with regex and then just replace all double spaces with a single space. OR you can make a function to find all colons, and then remove 3 characters before, and 2 characters afterwords.
It's possible that there could be double spaces or colons in the transcript that are not part of the timestamp though, so it might be a bit risky ??
(Dec-05-2018, 02:49 PM)DeaD_EyE Wrote: Problem: 99:99 is also a valid match.
A better pattern:
timestamp = r'[012][0123456789]:[012345][0123456789] ' timestamp = r'[0-2]\d:[0-5]\d ' # short formBut this also allows values in timestamps like 25:00, which is an invalid time.
You can check each timestamp, if it's valid and if, then removing it.
The question is, do you need that?
I tried both of your solutions and they both worked. Had a very quick check through the timestamps with some searching, and seems 78:01 is the highest value. There are lots of values where the seconds value is '00'. The format is not hh:mm:ss , but mm:ss , so it seems having a value like 25:00 is okay.
I'm not sure if there are values like 25:60 , but would need to check as you stated.
(Dec-05-2018, 03:24 PM)buran Wrote: I guess timestamps are min:sec from start, not time like hh:mm, but it's up to OP to confirm that. In more broad aspect it raise the question of what are possible values, e.g. is it possible to have mmm:ss from start.
Yes, the format is min:sec from start, and the highest value is 78:01 , so only 2 numerics for the minutes. I guess this is a case of modifying the code to suit the data.
Thanks for those replies. :)