Python Forum
Pulling Specifics Words/Numbers from String
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pulling Specifics Words/Numbers from String
#1
Hi There,

I know this may seem like a silly question because there might be 1000s of tutorials but I just can't quite figure it out.

I have a data frame and it has a column (columnA) where its an Object but its just a string of text like this:

'columnA': ['680 PACKAGES FOR SOAP NET WEIGHT: 17, 000. 00 KGS']


I want to extract NET WEIGHT: 17, 000. 00 KGS

This is what I've tried thus far:

df['Net Weight'] = df['columnA'].str.extract('NET WEIGHT: (\d+ KGS)')
df['Net Weight'] = df['columnA'].str.extract('NET WEIGHT:? (\d+ KGS)')
df['Net Weight'] = df['columnA'].str.extract('NET\sWEIGHT:\s?(\d+\.?\d*\sKGS)')

df['Net Weight'] = df['columnA'].apply(lambda x: re.search(r'NET\sWEIGHT:\s([\d,]+\.\d+\sKGS)', x).group(1) if re.search(r'NET\sWEIGHT:\s([\d,]+\.\d+\sKGS)', x) else None)
-- Nothing works. It still shows NaN Values - I'm reading the API for re and other pandas.Series.str. doc. and I can't find something to suit my needs.


Also! Sometimes Net Weight comes In other forms like N.W.: or Net Weight; or Net Weight or Net WT: and the endings vary like KGS, KG,

I'm really not sure how further explore this.
Reply
#2
Is net weight always the first number following the colon? Your number, "17, 000. 00" contains extra spaces. Are they really there, or is that a typo?

This works if net weight starts with the first digit after the colon and goes to the end of the string.
df["Net Weight"] = df["columnA"].str.extract(r":.*?(\d.*)")
Reply
#3
(May-01-2023, 07:00 PM)deanhystad Wrote: Is net weight always the first number following the colon? Your number, "17, 000. 00" contains extra spaces. Are they really there, or is that a typo?

This works if net weight starts with the first digit after the colon and goes to the end of the string.
df["Net Weight"] = df["columnA"].str.extract(r":.*?(\d.*)")

Ah! Yes this is one of my issues, sometimes there are spaces, the formatting is all messed up and varies between each case.

The above with the spaces is correct.

Thanks for your help!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How do I check if the first X characters of a string are numbers? FirstBornAlbratross 6 1,551 Apr-12-2023, 10:39 AM
Last Post: jefsummers
  Having trouble installing scikit-learn via VSC and pulling my hair out pythonturtle 1 766 Feb-07-2023, 02:23 AM
Last Post: Larz60+
  (Python) Pulling data from UA Google Analytics with more than 100k rows into csv. Stockers 0 1,238 Dec-19-2022, 11:11 PM
Last Post: Stockers
  Pulling username from Tuple pajd 21 3,432 Oct-07-2022, 01:33 PM
Last Post: pajd
  Find and Replace numbers in String giddyhead 2 1,245 Jul-17-2022, 06:22 PM
Last Post: giddyhead
  Extract a string between 2 words from a text file OscarBoots 2 1,885 Nov-02-2021, 08:50 AM
Last Post: ibreeden
  Generate a string of words for multiple lists of words in txt files in order. AnicraftPlayz 2 2,822 Aug-11-2021, 03:45 PM
Last Post: jamesaarr
  pulling multiple lines from a txt IceJJFish69 3 2,593 Apr-26-2021, 05:56 PM
Last Post: snippsat
  Replacing a words' letters in a string cananb 2 3,474 Dec-01-2020, 06:33 PM
Last Post: perfringo
  Pulling Information Out of Dictionary Griever 4 2,903 Aug-12-2020, 02:34 PM
Last Post: Griever

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020