Match substring using regex

Pavel_47 · Jul-17-2022, 12:36 PM

Hello,

Here is case that doesn't work:

import re
video_info = 'Réalisation :\nPierre Lazarus\nScénario :\nEmmanuelle Moreau\nNoémie Parreaux\nProduction :\nFilmakademie Baden-Württemberg\nSWR\nARTE\nProducteur/-trice :\nGiacomo Vernetti Prot\nJennifer Miola\nImage :\nHovig Hagopian\nMontage :\nMathieu Pluquet\nMusique :\nLouis-Ronan Choisy'

realisation = re.search(r'Réalisation :\n(.*?)\n', video_info).group(1)
scenario = re.search(r'Scénario :\n(.*?) :\n', video_info).group(1)
print(realisation)
print(scenario)

In this example realisation - Ok, scenario - failded.
Any suggestions ?
Thanks.

Pavel_47 · Jul-17-2022, 03:25 PM

I've just tried to test regex on
https://regex101.com/

with this (although non-ideal) expression I can select what I want
[Image: Screenshot-from-2022-07-17-17-23-37.png]

[Image: Screenshot-from-2022-07-17-17-23-37.png]

Surprisingly if I use the same expression in Python re.search, it doesn't work.
Any comments ?

***snippsat*** · Jul-17-2022, 06:48 PM

Like this.

import re

video_info = 'Réalisation :\nPierre Lazarus\nScénario :\nEmmanuelle Moreau\nNoémie Parreaux\nProduction :\nFilmakademie Baden-Württemberg\nSWR\nARTE\nProducteur/-trice :\nGiacomo Vernetti Prot\nJennifer Miola\nImage :\nHovig Hagopian\nMontage :\nMathieu Pluquet\nMusique :\nLouis-Ronan Choisy'
realisation = re.search(r'Réalisation :\n(.*?)\n', video_info).group(1)
scenario = re.search(r"Scénario :\n(.*?)\n", video_info).group(1)
print(realisation)
print(scenario)

Output:Pierre Lazarus
Emmanuelle Moreau

As this data is from your last Thread ,can also parse element directly then no need for regex.

>>> content.find_element(By.CSS_SELECTOR, 'div.css-m3r1o3 > div.css-vhqfin > p:nth-child(1)').text
'Réalisation :'
>>> content.find_element(By.CSS_SELECTOR, 'div.css-m3r1o3 > div.css-vhqfin > ul:nth-child(2)').text
'Pierre Lazarus'

Pavel_47 · Jul-17-2022, 10:45 PM

Quote:scenario = re.search(r"Scénario :\n(.*?)\n", video_info).group(1)

No. What I'm looking for is to get Emmanuelle Moreau, Noémie Parreaux ... i.e. what is located between Scénario : and next template that looks like something :

bowlofred · Jul-18-2022, 07:16 AM

I think your regex101 test is succeeding because you have pasted the python string in as the example text. But the paste is using the literal characters \n, while the python string is a newline. Instead you should print that string (so it appears as multiple lines) and then paste those individual lines into the form.

The dot does not (normally) match a newline, so the (.*?) can't capture the newlines separating the names you want. You can turn that on by setting DOTALL in the regex.

You'll still have the problem of separating out the next template from the names, but at least you can get the capture to happen.

import re
video_info = 'Réalisation :\nPierre Lazarus\nScénario :\nEmmanuelle Moreau\nNoémie Parreaux\nProduction :\nFilmakademie Baden-Württemberg\nSWR\nARTE\nProducteur/-trice :\nGiacomo Vernetti Prot\nJennifer Miola\nImage :\nHovig Hagopian\nMontage :\nMathieu Pluquet\nMusique :\nLouis-Ronan Choisy'


realisation = re.search(r'Réalisation :\n(.*?)\n', video_info).group(1)
scenario = re.search(r'(?s)Scénario :\n(.*?) :\n', video_info).group(1)
print(realisation)
print(scenario)

Output:Pierre Lazarus
Emmanuelle Moreau
Noémie Parreaux
Production

Pavel_47 · Jul-18-2022, 07:40 AM

Thanks,
I've tried your suggestion with Python. Works ... altough it doesn't match original request, i.e. scenario should return only Emmanuelle Moreau, Noémie Parreaux (without Production).
BTW 2nd re doesn't work in regex101.

Pavel_47 · (This post was last modified: Jul-18-2022, 07:47 AM by Pavel_47.)

BTW, here is a solution how to solve problem using ordinary Python staff (sure, I'm aware that is probably far from optimal Rolleyes

)

video_info = 'Réalisation :\nPierre Lazarus\nScénario :\nEmmanuelle Moreau\nNoémie Parreaux\nProduction :\nFilmakademie Baden-Württemberg\nSWR\nARTE\nProducteur/-trice :\nGiacomo Vernetti Prot\nJennifer Miola\nImage :\nHovig Hagopian\nMontage :\nMathieu Pluquet\nMusique :\nLouis-Ronan Choisy'
video_info_list = video_info.split('\n')

list_keys = []
list_values = []
for item in video_info_list:
    if item[-1] == ':':
        list_keys.append(item[:-2])
        new_key = True
    else:
        if new_key:
            list_values.append(item)
        else:
            list_values[-1] = list_values[-1] + ', ' + item
        new_key = False

video_dict = dict(zip(list_keys, list_values))
for k, v in video_dict.items():
    print(f"{k:<20}{v}")

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	extract substring from a string before a word !!	evilcode1	3	2,051	Nov-08-2023, 12:18 AM Last Post: evilcode1
	Facing issue in python regex newline match	Shr	6	6,666	Oct-25-2023, 09:42 AM Last Post: Shr
	[SOLVED] [regex] Why isn't possible substring ignored?	Winfried	4	2,266	Apr-08-2023, 06:36 PM Last Post: Winfried
	Failing regex, space before and after the "match"	tester_V	6	2,976	Mar-06-2023, 03:03 PM Last Post: deanhystad
	Regex pattern match	WJSwan	2	2,993	Feb-07-2023, 04:52 AM Last Post: WJSwan
	ValueError: substring not found	nby2001	4	11,433	Aug-08-2022, 11:16 AM Last Post: rob101
	Substring Counting	shelbyahn	4	7,580	Jan-13-2022, 10:08 AM Last Post: krisputas
	Match key-value json,Regex	saam	5	8,057	Dec-07-2021, 03:06 PM Last Post: saam
	Python Substring	muzikman	4	3,474	Dec-01-2020, 03:07 PM Last Post: deanhystad
	regex.findall that won't match anything	xiaobai97	1	2,797	Sep-24-2020, 02:02 PM Last Post: DeaD_EyE

Match substring using regex

User Panel Messages

Announcements