Python Forum

Full Version: regex to extract only yy or yyyy
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Dear Python/Regex Experts,

I have two regex patterns that I use in Python that need a little improvement.

1. #m/d/yy month in Digits e.g. 1/2/98
pattern1 = r'(\d{1}/\d{1}/\d{2})'

I need an extra condition that after those final yy digits, there should be no other digits coming.
If they do, it is covered by a different pattern or not actually a date.

2. #yyyy e.g. 1984
pattern2 = '(\d{4})'

For the second pattern, I need to make sure that the year stands alone and has no more digits before or after.

I would really appreciate any help.
You should post what you tried(like in working code),and also test example with input and wanted output
I can me a mess of text that you work with,or it can be more structured.
(Jul-11-2018, 11:29 AM)metalray Wrote: [ -> ]For the second pattern, I need to make sure that the year stands alone and has no more digits before or after.
>>> import re
>>> 
>>> s = '1980 100 18000 2000 112 2018'
>>> re.findall(r'(?<!\d)\d{4}(?!\d)', s)
['1980', '2000', '2018']

# only 2
>>> s = '19 100 18000 20 1234 1 55'
>>> re.findall(r'(?<!\d)\d{2}(?!\d)', s)
['19', '20', '55']
{1} is an absolutely redundant modifier - \d and \d{1} are equivalent, so why would you want to add extra symbols? It is just wasteful.

{,1} is another issue - but again, ? does the same, only in one symbol instead of 4.

For 2 digits, e.g. \d\d is shorter than \d{2}, so I will still go for the former (but that is a matter of taste).

RE is complex enough without adding redundancies