Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Re.search misses string end
#1
Hi!

I have a strange problem. I have a PDF that I convert into a string using PDFminer. I change every "\n" to nothing (beause sometimes they appear at codebreaking places) I then search for a substring and then modify it a bit to match what I need. It wokrs perfectly for most cases but for some reason it does not in every situation and I don't know why. This is the code:

text1line = str(text).replace("\n", "")
designation1 = str(re.search('in the name of(.*)Voting', text1line))
designation0 = re.sub('<re.*name of ', '', designation1)
designation = str(designation0).split("Limited")[1]
This is a part of "text1line": "registered in the name of XYZ Limited OK04.Voting rights are"
What I would need is is just "OK04". This is what designation1 is in this example:

<re.Match object; span=(551, 608), match='in the name of State Street Nominees Limited OK0>

For some reason in some cases it misses the last character and I have absolutely no idea why (and also the ".Vot" part but that one is not needed.) The pdfs can vary a bit in format and I have not yet figured out what formats are wrong.

What is the problem?
Reply
#2
#!/usr/bin/python3
import re

text1line    = "registered in the name of XYZ Limited OK04.Voting rights are"
designation0 = re.search(r'Limited.(.*).Voting', text1line)
print(designation0.group(1))
Reply
#3
What heiner55 is showing you is that the string representation of an re Match object does not contain the full text that was found. Keeping in mind that re searches can be very complex, and can have multiple groups matching different parts of the string. So a simple string is not necessarily going to be able to represent what was matched.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#4
Thanks for your explanation.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Writing a Linear Search algorithm - malformed string representation Drone4four 10 927 Jan-10-2024, 08:39 AM
Last Post: gulshan212
  strategy to troubleshoot what pyinstaller misses hammer 0 939 May-23-2022, 01:05 AM
Last Post: hammer
  Search multiple CSV files for a string or strings cubangt 7 7,995 Feb-23-2022, 12:53 AM
Last Post: Pedroski55
  Search string in mutliple .gz files SARAOOF 10 6,910 Aug-26-2021, 01:47 PM
Last Post: SARAOOF
  fuzzywuzzy search string in text file marfer 9 4,555 Aug-03-2021, 02:41 AM
Last Post: deanhystad
  I want to search a variable for a string D90 lostbit 3 2,616 Mar-31-2021, 07:14 PM
Last Post: lostbit
  String search in different excel Kristenl2784 0 1,700 Jul-20-2020, 02:37 PM
Last Post: Kristenl2784
  Interactive Menu, String Search? maggotspawn 3 2,575 May-11-2020, 05:25 PM
Last Post: menator01
  binary search string help kietrichards 1 2,202 Mar-08-2019, 12:43 PM
Last Post: stullis
  search a string backwards Skaperen 2 2,328 Dec-30-2018, 04:32 AM
Last Post: Skaperen

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020