Python Forum

Full Version: Dealing with Exponential data
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Having csv file with the value as below
2.22971E+15,2.20408E+15
while reading from csv able to read the data as
'2229708157109627', '2204081406354342'
code:
import pandas as pd
with open('mypath\mycsv.csv', encoding="utf8") as csv_file:
    df = pd.read_csv(csv_file,index_col=None, header=0) 
but actually
2229708157109620, 2204081406354340

Why the values are changing here
from 2229708157109620 to 2229708157109627
from 2204081406354340 to 2204081406354342
?
how do you get significant digits after 71 or 08?

>>> 2.22971E+15
2229710000000000.0
>>> 2.20408E+15
2204080000000000.0
>>> d = Decimal('2.20408E+15')
>>> d
Decimal('2.20408E+15')
>>> int(d)
2204080000000000
Save the below records as csv file
2229708157109627
2204081406354342
and try reading them from pandas or with csv.
you could see output as
2229708157109620
2204081406354340
(May-29-2019, 10:25 AM)parthi1705 Wrote: [ -> ]Save the below records as csv file
2229708157109627
2204081406354342
(May-29-2019, 10:07 AM)parthi1705 Wrote: [ -> ]Having csv file with the value as below
2.22971E+15,2.20408E+15


So it's
Output:
2229708157109627,2204081406354342
not as stated in original post
Output:
2.22971E+15,2.20408E+15
(May-29-2019, 10:25 AM)parthi1705 Wrote: [ -> ]try reading them from pandas or with csv.

Here it is with csv module:

import csv

with open('spam.csv') as f:
    rdr = csv.reader(f)
    for line in rdr:
        foo, bar = line
        print(foo)
        print(int(foo))
        print(bar)
        print(int(bar))
Output:
2229708157109627 2229708157109627 2204081406354342 2204081406354342 >>>
what is in the file, that's what you get
The values must divided by 1e15?
You can use decimal.Decimal and change the precision, but Decimal is slow.

import decimal


def convert(integers, divisor='1e15'):
    with decimal.localcontext() as ctx:
        ctx.prec = 20
        for value in integers:
            yield decimal.Decimal(value) / decimal.Decimal(divisor)


list(convert(['2229708157109627', '2204081406354342']))
Output:
[Decimal('2.229708157109627'), Decimal('2.204081406354342')]
Numpy has a float128 datatype, but 64 seems to be enough to represent the value.
Are you using a 32 bit version of Python?
With numpy I got following (Python 3.7.3 x64):

In [34]: import numpy as np                                                                                                                   

In [35]: np.float128(num)                                                                                                                     
Out[35]: 2229708157109627.0

In [36]: np.float64(num)                                                                                                                      
Out[36]: 2229708157109627.0

In [37]: np.float32(num)                                                                                                                      
Out[37]: 2229708100000000.0

In [38]: np.float128(num) / 1e15                                                                                                              
Out[38]: 2.229708157109627

In [39]: np.float64(num) / 1e15                                                                                                               
Out[39]: 2.229708157109627

In [40]: np.float32(num) / 1e15                                                                                                               
Out[40]: 2.22970811252736

In [41]: num / 1e15                                                                                                                           
Out[41]: 2.229708157109627

In [42]: num                                                                                                                                  
Out[42]: 2229708157109627
(May-29-2019, 10:40 AM)buran Wrote: [ -> ]
(May-29-2019, 10:25 AM)parthi1705 Wrote: [ -> ]Save the below records as csv file
2229708157109627
2204081406354342
(May-29-2019, 10:07 AM)parthi1705 Wrote: [ -> ]Having csv file with the value as below
2.22971E+15,2.20408E+15


So it's
Output:
2229708157109627,2204081406354342
not as stated in original post
Output:
2.22971E+15,2.20408E+15

Buran,
If you open csv file you would probably see the data as 2.22971E+15,2.20408E+15 unless you have the mouse pointer over the cell.

Actual csv file data goes like below.

2229708157109620
2204081406354340
892984837048418000
2204081406354340
843482762412573
1667516366684810

>>>
>>> import pandas as pd
>>> with open('D:\mycsv.csv', encoding="utf8") as csv_file:
        df = pd.read_csv(csv_file,index_col=None, header=0) 

>>> df['ID'].head(6)
0      2229708157109627
1      2204081406354342
2    892984837048418304
3      2204081406354342
4       843482762412573
5      1667516366684818
Name: ID, dtype: int64
Save those records and see what it looks like...
Original records are from database to my csv file. one more wonder is when i save back them again to csv file able to see original records
>>> newdf= df['ID'].head(10)
>>> newdf.to_csv('D:\spam.csv', index = None, header=True, encoding='utf-8')
but why while displaying it shows differently ?? Huh Huh
Have attached screenshot for reference.
Installed version of Python 3.7.0
That is not python/pandas problem.
The problem is that you view your csv file in some spreadsheet software (Excel, OpenOffice, etc., most probably Excel)
That is why you see scientific (exponential) notation. In addition, it's the excel that truncates the significant digits up to 15, i.e. the length stays the same, but significant digits are the first 15.
See for yourself - on the left is the actual csv file and on the right side is the same file open in excel
[attachment=640]

To see what you really have in the file, open your csv file as it comes from the database in a notepad or notepadd++ or some other simple text editor. You will see it's what pandas show as dataframe values

As an advice and to avoid such confusions in the future, when refer to data/content in csv/txt files, always tell what you see in notepad/notepad++ and not in what you see being displayed in a spreadsheet.
It is not the issue in the way we view the data with csv file.. it the way we read the actual data from csv file.

Can you try reading the Csv file has below data
2229708157109620
2204081406354340
892984837048418000
and can show what you get in python or pandas or csv or with any other module.
Do you open the csv file in notepad and see the data in it as I requested in my previous post? Yes or No? Note - open the file as exported from DB, not opened and saved in excel.
I demonstrated that the issue is with the excel.
Pages: 1 2