Python Forum
Changing a string value to a numerical value using python code and a lamda function
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Changing a string value to a numerical value using python code and a lamda function
#1
I need to convert a column which indicates machine status (normal, broken or recovering) to a numeric representation. This seems easy enough, but I want to do it in one line of python 3 code if possible. It would be something like this.

sensor_data['label'] = sensor_data['machine_status'].map(lambda label: 0 if label == 'NORMAL' else 1)
.

I found this online, and I want to use it because it is only one line.

The pump has two states in the final version: normal or broken; broken includes recovering because the pump is still not functional when recovering.

I believe that this one line of python code can do it for all 220320 values in the column.

My question is, is it on the right track? Is there an even easier way to do it?

Any help appreciated.

Respectfully,

LZ
Reply
#2
Personally, I would useindex ()and a look-up table like this:
sensor_data = {'machine_status': 'BROKEN'} 
sensor_data ['label'] = 'NBR'.index (sensor_data ['machine_status'][0])
print (sensor_data)
Output:
{'machine_status': 'BROKEN', 'label': 1}
Reply
#3
I don't like the index idea. It is relatively slow and doesn't handle the case where the status name is not in the list. A better solution is to use a dictionary which is both faster and handles unexpected status names better.

I tried using map with your if statement and with a dictionary. The dictionary is slightly faster. I also tried using apply instead of map, and they are about the same.

The only way I could figure out to vectorize the substitution is using replace().

Here are my tests. Printed times are how long it took to create a new column of 20,000 values. I hand to make special accommodations to prevent the index method from crashing:
import pandas as pd
import numpy as np
from random import choice
from time import time

states = {"NORMAL": 0, "BROKEN": 1, "RECOVERING": 2}
keys = list(states.keys()) + [""]  # Add an invalid state

df = pd.DataFrame({"State": [choice(keys) for _ in range(20000)]})

start = time()
df["if"] = df["State"].map(
    lambda x: 0
    if x == "NORMAL"
    else 1
    if x == "BROKEN"
    else 2
    if x == "RECOVERING"
    else np.NaN
)
print("if map", time() - start)

start = time()
df["dict"] = df["State"].map(states)
print("dict map", time() - start)

start = time()
df["if apply"] = df["State"].apply(
    lambda x: 0
    if x == "NORMAL"
    else 1
    if x == "BROKEN"
    else 2
    if x == "RECOVERING"
    else np.NaN
)
print("if apply", time() - start)

start = time()
df["index map"] = df["State"].map(lambda x: keys.index(x))
print("index map", time() - start)

start = time()
df["replace"] = df["State"].replace("NORMAL", 0)
df["replace"] = df["replace"].replace("BROKEN", 1)
df["replace"] = df["replace"].replace("RECOVERING", 2)
print("replace", time() - start)

print(df[:10])
Output:
if map 0.0060176849365234375 dict map 0.0010302066802978516 if apply 0.007014036178588867 index map 0.005976438522338867 replace 0.003970146179199219 State if dict if apply index map replace 0 RECOVERING 2.0 2.0 2.0 2 2 1 NaN NaN NaN 3 2 NaN NaN NaN 3 3 RECOVERING 2.0 2.0 2.0 2 2 4 NaN NaN NaN 3 5 NORMAL 0.0 0.0 0.0 0 0 6 NORMAL 0.0 0.0 0.0 0 0 7 RECOVERING 2.0 2.0 2.0 2 2 8 BROKEN 1.0 1.0 1.0 1 1 9 BROKEN 1.0 1.0 1.0 1 1
Reply
#4
I see and understand these ideas. But what is wrong with my one-line proposal?

It seems simple and fast. It is using lambda function of which, I know little, hence the post. Will it do all the
entries in the column? There are 220320 of them

Please understand that there are two states: 1 and 0.

While there are three states listed in the machine status column, I consider broken and recovering to be the just one state and normal the other second state.


Respectfully,

LZ
Reply
#5
This is one line.
df["if"] = df["State"].map(
    lambda x: 0
    if x == "NORMAL"
    else 1
    if x == "BROKEN"
    else 2
    if x == "RECOVERING"
    else np.NaN
)
I thought I read in your initial post that the pump had 3 states, normal, broken, recovering, but that was a mistake on my part. BashBedlam made the same mistake so that's who I'm going to blame. Is there a guarantee that the status will always be either NORMAL or BROKEN? I allow for there being no state or an unexpected state.

If you don't like lambdas, use functions.
def status_to_number(status):
    if status == "NORMAL":
        return 0
    return 1

sensor_data['label'] = sensor_data['machine_status'].map(status_to_number)
lambda expessions are just a way of writing unnamed functions (with a few additional limitations).
Reply
#6
(Jul-05-2022, 06:06 PM)Led_Zeppelin Wrote: It seems simple and fast. It is using lambda function of which, I know little, hence the post. Will it do all the
entries in the column? There are 220320 of them
It's just that thelambdaandmapare unnecessary. You could just do this:
sensor_data ['label'] = 0 if sensor_data ['machine_status'] == 'NORMAL' else 1
Also, if you are doing 220320 of them you will need to put your one-liner in some kind of loop.
Reply
#7
I think you are missing that df is a dataframe. This does not work because df["State"] is a series, not an element of a list or value in a dictionary.
import pandas as pd
df = pd.DataFrame({"State": ["NORMAL", "BROKEN"]})
df["label"] = 0 if df["State"] == "NORMAL" else 1
Error:
Traceback (most recent call last): File "...", line 3, in <module> df["label"] = 0 if df["State"] == "NORMAL" else 1 ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
"put your one-liner in some kind of loop" is essentially what "map()" and apply() are doing
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Virtual Env changing mysql connection string in python Fredesetes 0 383 Dec-20-2023, 04:06 PM
Last Post: Fredesetes
  restrict user input to numerical values MCL169 2 925 Apr-08-2023, 05:40 PM
Last Post: MCL169
Question Inserting Numerical Value to the Element in Optionlist and Printing it into Entry drbilgehanbakirhan 1 819 Jan-30-2023, 05:16 AM
Last Post: deanhystad
  Code changing rder of headers Led_Zeppelin 0 914 Jul-13-2022, 05:38 PM
Last Post: Led_Zeppelin
  Sorting numerical values provided by QAbstractTableModel BigMan 0 1,375 Jun-04-2022, 12:32 AM
Last Post: BigMan
  Convert a string to a function mikepy 8 2,553 May-13-2022, 07:28 PM
Last Post: mikepy
  I want to simplify this python code into fewer lines, it's about string mandaxyz 5 2,136 Jan-15-2022, 01:28 PM
Last Post: mandaxyz
Thumbs Up Parsing a YAML file without changing the string content..?, Flask - solved. SpongeB0B 2 2,286 Aug-05-2021, 08:02 AM
Last Post: SpongeB0B
  changing Python files to .exe alok 2 2,249 Jul-20-2021, 02:49 PM
Last Post: alok
  Putting code into a function breaks its functionality, though the code is identical! PCesarano 1 2,003 Apr-05-2021, 05:40 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020