Re.search misses string end

CaptainCsaba · (This post was last modified: Apr-01-2019, 02:13 PM by CaptainCsaba.)

Hi!

I have a strange problem. I have a PDF that I convert into a string using PDFminer. I change every "\n" to nothing (beause sometimes they appear at codebreaking places) I then search for a substring and then modify it a bit to match what I need. It wokrs perfectly for most cases but for some reason it does not in every situation and I don't know why. This is the code:

text1line = str(text).replace("\n", "")
designation1 = str(re.search('in the name of(.*)Voting', text1line))
designation0 = re.sub('<re.*name of ', '', designation1)
designation = str(designation0).split("Limited")[1]

This is a part of "text1line": "registered in the name of XYZ Limited OK04.Voting rights are"
What I would need is is just "OK04". This is what designation1 is in this example:

<re.Match object; span=(551, 608), match='in the name of State Street Nominees Limited OK0>

For some reason in some cases it misses the last character and I have absolutely no idea why (and also the ".Vot" part but that one is not needed.) The pdfs can vary a bit in format and I have not yet figured out what formats are wrong.

What is the problem?

heiner55 · May-24-2019, 10:39 AM

#!/usr/bin/python3
import re

text1line    = "registered in the name of XYZ Limited OK04.Voting rights are"
designation0 = re.search(r'Limited.(.*).Voting', text1line)
print(designation0.group(1))

***ichabod801*** · May-24-2019, 12:50 PM

What heiner55 is showing you is that the string representation of an re Match object does not contain the full text that was found. Keeping in mind that re searches can be very complex, and can have multiple groups matching different parts of the string. So a simple string is not necessarily going to be able to represent what was matched.

heiner55 · (This post was last modified: May-25-2019, 01:46 PM by heiner55.)

Thanks for your explanation.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to read a file as binary or hex "string" so that I can do regex search?	tatahuft	3	982	Dec-19-2024, 11:57 AM Last Post: snippsat
	Writing a Linear Search algorithm - malformed string representation	Drone4four	10	3,946	Jan-10-2024, 08:39 AM Last Post: gulshan212
	strategy to troubleshoot what pyinstaller misses	hammer	0	1,428	May-23-2022, 01:05 AM Last Post: hammer
	Search multiple CSV files for a string or strings	cubangt	7	12,864	Feb-23-2022, 12:53 AM Last Post: Pedroski55
	Search string in mutliple .gz files	SARAOOF	10	9,637	Aug-26-2021, 01:47 PM Last Post: SARAOOF
	fuzzywuzzy search string in text file	marfer	9	8,275	Aug-03-2021, 02:41 AM Last Post: deanhystad
	I want to search a variable for a string D90	lostbit	3	3,410	Mar-31-2021, 07:14 PM Last Post: lostbit
	String search in different excel	Kristenl2784	0	2,104	Jul-20-2020, 02:37 PM Last Post: Kristenl2784
	Interactive Menu, String Search?	maggotspawn	3	3,437	May-11-2020, 05:25 PM Last Post: menator01
	binary search string help	kietrichards	1	2,912	Mar-08-2019, 12:43 PM Last Post: stullis

Re.search misses string end

User Panel Messages

Announcements