May-03-2023, 08:18 AM
Oh, right. Lookbehind patterns must be fixed-width. Can't put an optional piece in them. In that case I'd just use a regular search and capture the number.
Assuming you want a period as a decimal separator, you could look if a comma occurs later than a period in the string. If it does, translate them to each other. If you only have one of the two, I don't have a good suggestion.
import re text = """WEIGHT: 18. 520, 0 0 KGS WEIGHT: 18. 583, 000 KGS WEIGHT 6000. 0000 KG WEIGHT: 17. 624, 00 KGS WEIGHT: 17. 046, 00 KGS WEIGHT; 16. 235, 86 KGS WEIGHT; 13. 672 WEIGHT: 29. 631, 000 WEIGHT: 218768. 000 KGS WEIGHT: 15 MT WEIGHT; 14. 834, 32 KGS WEIGHT; 11. 311, 08 KG""" for entry in text.splitlines(): entry = entry.replace(" ", "").lower() print(entry, end=" => ") if m := re.search('weight[;:]?([\d,.]+)kg', entry): print(m.group(1)) else: print("NO MATCH")
Output:weight:18.520,00kgs => 18.520,00
weight:18.583,000kgs => 18.583,000
weight6000.0000kg => 6000.0000
weight:17.624,00kgs => 17.624,00
weight:17.046,00kgs => 17.046,00
weight;16.235,86kgs => 16.235,86
weight;13.672 => NO MATCH
weight:29.631,000 => NO MATCH
weight:218768.000kgs => 218768.000
weight:15mt => NO MATCH
weight;14.834,32kgs => 14.834,32
weight;11.311,08kg => 11.311,08
The inconsistent separators is more annoying. If you had something like 18,000
, how would you be sure which way to interpret?Assuming you want a period as a decimal separator, you could look if a comma occurs later than a period in the string. If it does, translate them to each other. If you only have one of the two, I don't have a good suggestion.