Python Forum

I have a parquet file with 4 columns. It looks something like below.

TYPE | ID | SRNO | AMT

D | 123456 | 1 | 100.00
D | 123457 | 2 | 200.00
D | 123459 | 3 | 500.00
D | 123458 | 4 | 1000.00

The Schema for this file is

dataframe.printSchema

Output:|-- TYPE: string (nullable = true)
|-- ID: integer (nullable = true)
|-- SRNO: integer (nullable = true)
|-- AMT: decimal(15,2) (nullable = true)

NOTE : When I read this file in pandas the schema changes for decimal and is represented as object

pandas_dataframe.dtypes

Output:TYPE    object
ID      int32
SRNO    int32
AMT     object

I get the below error when I try to run a sql on the Dataframe.

ps.sqldf("select * from pandas_dataframe")

Error:Traceback (most recent call last):
 File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1229, in _execute_context
   cursor, statement, parameters, context
 File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 577, in do_executemany
   cursor.executemany(statement, parameters)
sqlite3.InterfaceError: Error binding parameter 3 - probably unsupported type.

FYI .. I tried casting the Decimal field to String , Double and it works fine .

Does it mean Pandasql cannot handle a file that has Decimal Datatypes ? Is there a better programatic alternatives to handles this than Typecasting explicitly .

Thanks in Advance

geethchi