Python Forum
Removing timestamps from transcriptions
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Removing timestamps from transcriptions
#4
import re


timestamp = re.compile(r'\d{2}:\d{2} ') # <- the white space is a part of the timestamp
text = '''So from 12:23 that very moment you actually 12:25 actually continued the form of a choice, 12:28 that you kept going to the point of 12:30 no return, where you actually became who 12:33 you are now. 12:35 So you actually took part in 12:38 who you are now.'''

filtered_text = timestamp.sub('', text)
print(filtered_text)
Problem: 99:99 is also a valid match.

A better pattern:
timestamp = r'[012][0123456789]:[012345][0123456789] '
timestamp = r'[0-2]\d:[0-5]\d ' # short form
But this also allows values in timestamps like 25:00, which is an invalid time.

You can check each timestamp, if it's valid and if, then removing it.
The question is, do you need that?
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Messages In This Thread
RE: Removing timestamps from transcriptions - by DeaD_EyE - Dec-05-2018, 02:49 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to find tags using specific text (timestamps) in a url? q988988 1 1,403 Mar-08-2022, 08:09 AM
Last Post: buran
  Speech Recognition with timestamps DeanAseraf1 3 6,683 Jun-27-2021, 06:58 PM
Last Post: gh_ad
Bug Help on Flagging Timestamps Daring_T 2 1,919 Oct-28-2020, 08:11 PM
Last Post: Daring_T
  How to compare timestamps in python asad 2 9,138 Oct-24-2018, 03:56 AM
Last Post: asad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020