Python Forum
Regex ignoring letter 'e'
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex ignoring letter 'e'
#1
Hi

I am using the following code. Forgive the mess, and the murder of python conventions, Rolleyes its my first project. I am trying to format a bank statement.

import re

raw_string = " Inward Clg Cheque               25,319.00-         04NOV        13,164,022.62CR" # This is the first line of a record
list_of_data = []


"""A regular expression for starting line of record."""
record_start_identifier = re.compile(r'''
	(\s\w+)(\s\w+)?(\s\w+)?(\s\w+)?(\s\w+)?(\s\w+)?(\s\w+)?(\s\w+)?(\s\w+)?(\s\w+)?		# Groups 1-10 of starting text with a leading space - upto max of 10 words
	\s+																					# Spaces before start of transaction amount
	(\d+),?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?		# Groups 11-20 of transaction amount
	\.																					# Decimal point dot
	(\d+)																				# Group 21 of numbers after decimal point
	(-?)																				# Group 22. If string is - it is a negetive number, else it is positive
	\s+																					# Space after end of transation amount
	(\d{2}\w{3})																		# Group 23 for date without year.
	\s+																					# Space before start of cumulative balance
	(\d+),?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?,?(\d*)?		# Groups 24-33 of balance amount
	\.																					# Decimal point dot
	(\d+)																				# Group 34 of numbers after decimal point
	((CR)|(DR))																			# Group 35 for debit or credit indicator

	''', re.VERBOSE)

def group_first_line(line):
	"""Breaks record start line into groups."""

	data_groups = record_start_identifier.search(line)
	return data_groups

def convert_record_start_to_list(match_object, empty_list_to_update):
	"""Convert the group created by group_first_line() into a list."""

	first_word = match_object.group(1)
	first_word = first_word.lstrip()

	full_transaction_type = f"{first_word}{match_object.group(2)}{match_object.group(3)}{match_object.group(4)}{match_object.group(5)}{match_object.group(6)}{match_object.group(7)}{match_object.group(8)}{match_object.group(9)}{match_object.group(10)}"
	full_transaction_type = full_transaction_type.strip('None')

	empty_list_to_update.append(full_transaction_type)

data_groups = group_first_line(raw_string)
convert_record_start_to_list(data_groups, list_of_data)

print(list_of_data)
print result: ['Inward Clg Chequ']

Why is it skipping the letter 'e' in Cheque? Wall If I substitute some other letter for 'e' in the raw_string, it gets printed.

And please also let me know how to shorten the full_transaction_type code that converts groups to a string.

Thanks
Reply
#2
Line 38. That doesn't strip the word 'None', it strips any of the characters 'N', 'o', 'n', or 'e'.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
Thanks! I am going to change some code to avoid using strip(). Will see if it works.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020