Pulling Specifics Words/Numbers from String

bigpapa · May-01-2023, 06:43 PM

Hi There,

I know this may seem like a silly question because there might be 1000s of tutorials but I just can't quite figure it out.

I have a data frame and it has a column (columnA) where its an Object but its just a string of text like this:

'columnA': ['680 PACKAGES FOR SOAP NET WEIGHT: 17, 000. 00 KGS']

I want to extract NET WEIGHT: 17, 000. 00 KGS

This is what I've tried thus far:

        
              df['Net Weight'] = df['columnA'].str.extract('NET WEIGHT: (\d+ KGS)')
df['Net Weight'] = df['columnA'].str.extract('NET WEIGHT:? (\d+ KGS)')
df['Net Weight'] = df['columnA'].str.extract('NET\sWEIGHT:\s?(\d+\.?\d*\sKGS)')
 
df['Net Weight'] = df['columnA'].apply(lambda x: re.search(r'NET\sWEIGHT:\s([\d,]+\.\d+\sKGS)', x).group(1) if re.search(r'NET\sWEIGHT:\s([\d,]+\.\d+\sKGS)', x) else None)

-- Nothing works. It still shows NaN Values - I'm reading the API for re and other pandas.Series.str. doc. and I can't find something to suit my needs.

Also! Sometimes Net Weight comes In other forms like N.W.: or Net Weight; or Net Weight or Net WT: and the endings vary like KGS, KG,

I'm really not sure how further explore this.

**deanhystad** · (This post was last modified: May-01-2023, 07:00 PM by deanhystad.)

Is net weight always the first number following the colon? Your number, "17, 000. 00" contains extra spaces. Are they really there, or is that a typo?

This works if net weight starts with the first digit after the colon and goes to the end of the string.

        
              df["Net Weight"] = df["columnA"].str.extract(r":.*?(\d.*)")

bigpapa · (This post was last modified: May-01-2023, 07:22 PM by bigpapa.)

(May-01-2023, 07:00 PM)deanhystad Wrote: Is net weight always the first number following the colon? Your number, "17, 000. 00" contains extra spaces. Are they really there, or is that a typo?

This works if net weight starts with the first digit after the colon and goes to the end of the string.

1

df["Net Weight"] = df["columnA"].str.extract(r":.*?(\d.*)")

Ah! Yes this is one of my issues, sometimes there are spaces, the formatting is all messed up and varies between each case.

The above with the spaces is correct.

Thanks for your help!

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Pulling data from mssql to PG DB	hartman60	1	541	Jan-31-2025, 12:26 PM Last Post: hartman60
	How do I check if the first X characters of a string are numbers?	FirstBornAlbratross	6	3,165	Apr-12-2023, 10:39 AM Last Post: jefsummers
	Having trouble installing scikit-learn via VSC and pulling my hair out	pythonturtle	1	1,585	Feb-07-2023, 02:23 AM Last Post: Larz60+
	(Python) Pulling data from UA Google Analytics with more than 100k rows into csv.	Stockers	0	1,991	Dec-19-2022, 11:11 PM Last Post: Stockers
	Pulling username from Tuple	pajd	21	6,834	Oct-07-2022, 01:33 PM Last Post: pajd
	Find and Replace numbers in String	giddyhead	2	3,152	Jul-17-2022, 06:22 PM Last Post: giddyhead
	Extract a string between 2 words from a text file	OscarBoots	2	2,792	Nov-02-2021, 08:50 AM Last Post: ibreeden
	Generate a string of words for multiple lists of words in txt files in order.	AnicraftPlayz	2	4,082	Aug-11-2021, 03:45 PM Last Post: jamesaarr
	pulling multiple lines from a txt	IceJJFish69	3	3,492	Apr-26-2021, 05:56 PM Last Post: snippsat
	Replacing a words' letters in a string	cananb	2	4,732	Dec-01-2020, 06:33 PM Last Post: perfringo

Pulling Specifics Words/Numbers from String

User Panel Messages

Announcements