Python Forum
remove b due to conversion in PyQ
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
remove b due to conversion in PyQ
#1
Hi Team,

Am trying to get rid of the b that appears in the column?

Data type of my output:
df.info()
Output:
Date: Dtype: object Col 2: Dtype: object Col 3: Dtype: object Col 4: Dtype: int16
I want to remove [b''] from output:

[b'yxyz'] from Col 2 - 4, tried below

df.replace('b', '')
df.replace(r'[^a-z]','',regex=True)
Still no luck....can you help me?
Yoriz write Jul-15-2021, 08:24 PM:
Please post all code, output and errors (in their entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
Can you provide an example of the unwanted "b"?
Reply
#3
The col where am extracting the output from looks to be a byte string.

In other words, when I just output only that col, I can see the output as below:



[b'234xyz678kkkkkk']



All I want to do is avoid, the [b''] ( Square brackets, b & singe quotes in my dataframe)
Reply
#4
There is no 'b' in the df. That is why you cannot remove it. The 'b' is added when the value is displayed. It provides information about the data type. In your example the 'b' is telling you that '234xyz678kkkkkk' is a byte string or bytes. You can convert this to a str by decoding the bytes to unicode characters.
x = b'1234'
print(x, x.decode('utf8'))
Output:
b'1234' 1234
The square brackets is telling you that this is a list. You need to index the list to get the value or unpack the list to get all the values.
x = [1, 2, 3]
print(x, x[2], *x)
Output:
[1, 2, 3] 3 1 2 3
You need to remember that the way a value is displayed when you print it or view it in you IDE is only somebodies idea of how to best represent that value. bytes do not have a 'b' and lists do not have brackets. These are only visual clues added when the object is converted to a string for display purposes.
Reply
#5
I see the problem here, I was getting AttibuteError: recarray has no attribute decode.

My output var is a list of numpy arrays, just to drill down the specific problem, I tried

x = b'thef45th'
print(x.decode('uft8'))
Output came as expected: thef45th
Even if I try to access with index position from my array, still ended up in same error.
Could you please help to by-pass the decode error for List of numpy arrays?
Reply
#6
I do not understand what you are asking. You need to provide more context. Some code would be best.
Reply
#7
This is my Qtable/Qlist info on datatypes:


Data columns (total 5 columns):

#   Column        Non-Null Count  Dtype

---  ------        --------------  -----

 0   Col1       5 non-null      object

1   Col2       5 non-null      object

2   Col3        5 non-null      object

3   Col3       5 non-null      object

4   Col4          5 non-null      int16

dtypes: int16(1), object(4)
I thought using lambda will help to decode only the object columns which is in bytes as I have 1 int as datatypes.


df = df.apply(lambda x: x.decode() if isinstance(x, bytes) else x)

df.to_csv (r'PATH/Out.csv',index = False, header=True)
Right now, my output is below on CSV:


Col1,Col2,Col3,Col3,Col4

[''],[b'abcdfe'],[b'ABC'],[b''],0

[''],[b'hijkl'],[b'LMNDE'],[b''],0

[''],[b'mno'],[b'YUTER'],[b''],0
I want to convert the bytes column to normal col and then write it to csv.



Am expecting csv output to be below:

Col1,Col2,Col3,Col3,Col4

,abcdfe,ABC,,0

,hijkl',LMNDE',,0

,mno',YUTER',,0
Reply
#8
You are looking at the wrong thing. In your code the 'x' in the lambda is a column (Series). A Series is not a bytes, so nothing is replaced. I think you want to do something like this:
for col in df:
    df[col] = df[col].apply(lambda x: x.decode() if isinstance(x, bytes) else x)
Reply
#9
Thanks, tried as suggested, now when I print the output:
print(df[col])
It now displays only int, which is Col4 along with index:

0 0
1 0
2 0
3 0
4 0

Col 1 to 3 are missing on my CSV, not very sure what am missing here.

Even tried, no luck still the same.

df[col].astype(str,errors='raise')
Reply
#10
df[col] is a Series, essentially 1 column from your dataframe. Why would printing df[col] print other columns?

It is difficult to help you because I have no idea what you are trying to do. Why do you care about the b''? If you don't want the b'' is there something you could do to prevent putting bytes in the dataframe? How are the bytes getting in the dataframe? I'm done here until you provide enough information that someone other than you can understand your questions.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020