Python Forum
Removing timestamps from transcriptions
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Removing timestamps from transcriptions
#6
(Dec-05-2018, 12:32 PM)metulburr Wrote: You could do it with regex and then just replace all double spaces with a single space. OR you can make a function to find all colons, and then remove 3 characters before, and 2 characters afterwords.

It's possible that there could be double spaces or colons in the transcript that are not part of the timestamp though, so it might be a bit risky ??

(Dec-05-2018, 02:49 PM)DeaD_EyE Wrote: Problem: 99:99 is also a valid match.

A better pattern:
timestamp = r'[012][0123456789]:[012345][0123456789] '
timestamp = r'[0-2]\d:[0-5]\d ' # short form
But this also allows values in timestamps like 25:00, which is an invalid time.

You can check each timestamp, if it's valid and if, then removing it.
The question is, do you need that?

I tried both of your solutions and they both worked. Had a very quick check through the timestamps with some searching, and seems 78:01 is the highest value. There are lots of values where the seconds value is '00'. The format is not hh:mm:ss , but mm:ss , so it seems having a value like 25:00 is okay.

I'm not sure if there are values like 25:60 , but would need to check as you stated.

(Dec-05-2018, 03:24 PM)buran Wrote: I guess timestamps are min:sec from start, not time like hh:mm, but it's up to OP to confirm that. In more broad aspect it raise the question of what are possible values, e.g. is it possible to have mmm:ss from start.

Yes, the format is min:sec from start, and the highest value is 78:01 , so only 2 numerics for the minutes. I guess this is a case of modifying the code to suit the data.

Thanks for those replies. :)
Reply


Messages In This Thread
RE: Removing timestamps from transcriptions - by jehoshua - Dec-09-2018, 09:27 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to find tags using specific text (timestamps) in a url? q988988 1 1,405 Mar-08-2022, 08:09 AM
Last Post: buran
  Speech Recognition with timestamps DeanAseraf1 3 6,688 Jun-27-2021, 06:58 PM
Last Post: gh_ad
Bug Help on Flagging Timestamps Daring_T 2 1,922 Oct-28-2020, 08:11 PM
Last Post: Daring_T
  How to compare timestamps in python asad 2 9,147 Oct-24-2018, 03:56 AM
Last Post: asad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020