Python Forum
Imported csv, column text converted to number but not summing
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Imported csv, column text converted to number but not summing
#1
Hi

Obviously did this wrong.

Situation:
CSV text file importing into pandas
Problem, the source has numbers with ., rather than ,. so its been imported as text.
Applied a replace (for . to nothing) and a second replace (comma to dot) so now looks like a number.
Used the .astype to try to push to float.
df.update(df[c].str.replace('.','').str.replace(',','.').astype(float))
variables
- c is from a list (at the moment only one column but may have more in the future)
- df is the dataframe
- bfc is the variable of the class

Coding is in a class (trying something out)
bfc.dfDemands.groupby(by='source').sum()
only returns the source column and a column set to False.

The column that is meant to be a number is not returning.

Any pointers as to what I can look at to solve this or a different route to import... welcomed
Reply
#2
Hi, could you upload the file?
Reply
#3
nope. It has business data in it.

import pandas as pd

df = pd.DataFrame([['a','1,00'],['b','1.000,20'],['c','123,45']],columns=['source','rev'])

r=['rev']
for c in r:
    df.update(df[c].str.replace('.','').str.replace(',','.').astype(float))
df.groupby(by='source').sum()
Was going to use this, but it works fine on this data...

I have a lot more columns, but its strange that it only affects this one file with the string number reversed comma/dot.
Reply
#4
You should use pd.read_csv for this task. This function has many parameters which define the behavior of parsing data and they know the problem with comma or dot as decimal sign. Also the thousands sep can and should be specified. You should read this document: https://pandas.pydata.org/pandas-docs/st...d_csv.html

from textwrap import dedent
from io import StringIO


import pandas as pd


data = """
    source;rev
    a;1,00
    b;1.000,20
    c;123,45
    """

csv_data = StringIO(dedent(data.strip()))
# csv_data is a file-like object
# just for testing

df = pd.read_csv(csv_data, decimal=',', thousands='.', sep=';')
print(df['rev'])
Output:
0 1.00 1 1000.20 2 123.45 Name: rev, dtype: float64
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#5
Hi thanks

Nice suggestion, I did indeed end up using it. As it was in a process which received different files it had to change for that one file. Works fine :)

Thank you
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Parsing and summing time deltas (duration) onto bar + pie charts using pandas - - DRY Drone4four 2 581 Feb-10-2024, 06:04 PM
Last Post: Drone4four
  Make unique id in vectorized way based on text data column with similarity scoring ill8 0 889 Dec-12-2022, 03:22 AM
Last Post: ill8
  Pandas: summing columns conditional on the column labels ddd2332 0 2,113 Sep-10-2020, 05:58 PM
Last Post: ddd2332
  Combine a number into integer column and preserv format as number zinho 2 1,929 Dec-23-2019, 05:02 PM
Last Post: zinho
  Recognize text from number plate Sheheryar 4 3,816 Aug-27-2019, 04:00 AM
Last Post: Sheheryar
  Custom timeinterval converted to hourly values using Pandas? SinPy 1 2,785 Jun-07-2019, 05:06 AM
Last Post: heiner55
  Can't store pandas converted json dataframe into mongoDB mahmoud899 1 4,230 Dec-12-2018, 07:45 PM
Last Post: nilamo
  Text to column pandas ms5573 0 2,261 Aug-25-2018, 08:18 PM
Last Post: ms5573
  summing rows/columns more quickly jon0852 3 2,890 Feb-12-2018, 02:24 AM
Last Post: ka06059
  sklearn imported but not recognized kerberg 6 16,437 Jun-18-2017, 12:32 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020