Jun-07-2020, 10:11 PM
From all I've read, these two functions should produce the same results, given a regex pattern with two groups. I'd like to know if I'm using re.sub() incorrectly or if I've found some bug.
match = re.search(pattern, input)
result1 = match.group(1) + match.group(2)
result2 = re.sub(pattern, replace with groups 1 & 2, input)
Can you think of any reason re.sub() would pull in a bunch of garbage that isn't in either of the groups? Given a statement like
Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf.md
where the Red text is identified as groups 1 and 2. The re.sub() should put it together as Bodywork.md but it doesn't! I've used match.groups() from the same library as a sanity check.
I've put together some sample code with some text to search, based on a conversion I'm trying to do for a small project.
Here's the output first. Thanks for looking!
match = re.search(pattern, input)
result1 = match.group(1) + match.group(2)
result2 = re.sub(pattern, replace with groups 1 & 2, input)
Can you think of any reason re.sub() would pull in a bunch of garbage that isn't in either of the groups? Given a statement like
import re re.sub( regexpattern, "\g<1>\g<2>", SourceText)For instance this is a line of source text
Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf.md
where the Red text is identified as groups 1 and 2. The re.sub() should put it together as Bodywork.md but it doesn't! I've used match.groups() from the same library as a sanity check.
I've put together some sample code with some text to search, based on a conversion I'm trying to do for a small project.
Here's the output first. Thanks for looking!

Output:index: 1
Source : Projects bf587944624a417c83475fdb67c176ba.md
Groups : ('Projects', '.md')
Result1: Projects.md
Result2: Projects.md
index: 3
Source : Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf.md
Groups : ('Bodywork', '.md')
Result1: Bodywork.md
Result2: Projects bf587944624a417c83475fdb67c176ba/Bodywork.md
index: 5
Source : Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Home Exercise 4871ab1851074a1cb7aebe0851669345.csv
Groups : ('Home Exercise', '.csv')
Result1: Home Exercise.csv
Result2: Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Home Exercise.csv
import re paths = ['Projects bf587944624a417c83475fdb67c176ba/', 'Projects bf587944624a417c83475fdb67c176ba.md', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf.md', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Home Exercise 4871ab1851074a1cb7aebe0851669345/', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Home Exercise 4871ab1851074a1cb7aebe0851669345.csv', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Home Exercise 4871ab1851074a1cb7aebe0851669345/Abs da0050d8459345419d1a16062273cfac.md', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Home Exercise 4871ab1851074a1cb7aebe0851669345/Core 82039eb85d5d46bc99e8504427d203c4.md', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Micronutrient Smoothie 21e2b0c0922d46f387c8b353a17ff734.md', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Self Bodywork 0045821b69f445678e07d49b5c80b9d0/', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Self Bodywork 0045821b69f445678e07d49b5c80b9d0.csv', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Self Bodywork 0045821b69f445678e07d49b5c80b9d0/Cuboid physical therapy ff8d7937722a4af6aa2ce1ce8c45672b.md', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Self Bodywork 0045821b69f445678e07d49b5c80b9d0/Extending hamstrings faeba9f5302340f1945b898c6291aa86.md', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Self Bodywork 0045821b69f445678e07d49b5c80b9d0/Knee care 8265d491502a49b0abf2922d9e7764e3.md', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Self Bodywork 0045821b69f445678e07d49b5c80b9d0/Shoulder therapy massage motion 49e3a56cbbfc4733a0ddda272c504912.md', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Self care weekly splits 861d60286a1e48dbb7ed7556d4214622/', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Self care weekly splits 861d60286a1e48dbb7ed7556d4214622.md', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Self care weekly splits 861d60286a1e48dbb7ed7556d4214622/Self Bodywork 5c1104d3456a4106872eee9dc531e182.csv', 'Projects bf587944624a417c83475fdb67c176ba/Bodywork 731fe478ea6048e1ac0df8c7f7ed95bf/Workout weekly splits 4b9d808f79544f5489ef063f1048109a.md', 'Projects bf587944624a417c83475fdb67c176ba/PROJECTS TEMPLATE a6292c48f0d343c9a2913c0adf97bbf2.md', 'Routine 60ee969daa894c4d9abdb0d58166f5d4/', 'Routine 60ee969daa894c4d9abdb0d58166f5d4.csv', 'Routine 60ee969daa894c4d9abdb0d58166f5d4/Evening Routine 29e2c5282db04c76a59d0053eb9e85ee.md', 'Routine 60ee969daa894c4d9abdb0d58166f5d4/Morning Routine 4570adb138b7412a8bbe948746585924.md', 'Routine 60ee969daa894c4d9abdb0d58166f5d4/Physical Activity 8b8ba3700a194ba7ad6330802ecccdf5.md'] filenamepattern = "([\w\s]+)\s\w{32}(\.md|\.csv)$" #regex capture groups 1 & 2 # Create an indexed list of new filenames index = [] fname1 = [] fname2 = [] for line in enumerate(paths): match = re.search(filenamepattern,line[1]) #Search & if match: index.append(line[0]) #save index for paths changes fname1.append( match.group(1) + match.group(2) ) #Replace 1 using re.group() fname2.append( re.sub(filenamepattern, "\g<1>\g<2>", line[1]) ) #Search & Replace 2 using re.sub() if len(index) <= 3: #print a few for comparison print("index:",index[-1]) print("Source : "+line[1]) print("Groups :",match.groups()) print("Result1: "+fname1[-1]) print("Result2: "+fname2[-1]) print()I've put up the regex with the same sample data at regexr dot com. I don't think I'm allowed to add links here as a new member but if you want to modify it and see results right away just add /568jc to the end of the URL. I'm not at all affiliated. Just a cool website!