Hi all,
So I have this code below that filters the weight column in a data frame. And I want to remove the : and ; in-front of the number:
Also if someone has any idea of how to fix the commas separating the numbers so they're consistent that would be an added bonus.
Thank you, regular expressions are hard
So I have this code below that filters the weight column in a data frame. And I want to remove the : and ; in-front of the number:
for entry in df.loc[df["Weight"] .str.replace("\s", '', regex = True) .str.contains('Weight', case = False, na = False), "Weight"].sample(10, random_state=2): print(re.findall(r'(?<=weight).*?(?=kg)', re.sub("\s", "", entry).lower() ) )
[':16.696,00'] [';16.981,44', ';13.672,10', ';16.981,44', ';16.981,44', ';16.235,86'] [':17.046,00'] [':18.345,00'] [':17.624,00'] [':17,063.00'] ['6000.0000'] [':18.583,000'] [':18.520,00'] [';16.981,44']Thus far have tried adding [;:] into the regular expression like so:
for entry in df.loc[df["Weight"] .str.replace("\s", '', regex = True) .str.contains('Weight', case = False, na = False), "Weight"].sample(10, random_state=2): print(re.findall(r'(?<=weight[:;]).*?(?=kg)', re.sub("\s", "", entry).lower() ) )But it returns this:
['16.696,00'] ['16.981,44', '13.672,10', '16.981,44', '16.981,44', '16.235,86'] ['17.046,00'] ['18.345,00'] ['17.624,00'] ['17,063.00'] [] ['18.583,000'] ['18.520,00'] ['16.981,44']Do you see how if the item does not have a : or a ; it gets deleted? How do I prevent this. Also! I have tried [:;?] and [:;*?]
Also if someone has any idea of how to fix the commas separating the numbers so they're consistent that would be an added bonus.
Thank you, regular expressions are hard


