Issue with reading CSV file

nnsatpute · (This post was last modified: Dec-10-2018, 12:38 PM by Gribouillis.)

Hello,

I am creating CSV file in unix with help of sed command.
File is getting created correctly as below:

Output:Column_1 Nunber(10,1)
Column_1 Decimal(8,2)
Column_1 Varchar

I am trying to read this file row by row in python as below

with open (datafile) as csvfile:
        reader = csv.reader(csvfile,dialect='excel')
        row = next(reader)
            for row in reader:
            sql = sql + (" ".join(row)) +  (",") + "\n"
        
sql= sql[:-2]

when i am trying to put this row in sql file the comma in datatype is missing like below

Output:Column_1 Nunber(10 1)
Column_1 Decimal(8 2)

Is there any reason for this ?

**Gribouillis** · (This post was last modified: Dec-10-2018, 12:46 PM by Gribouillis.)

Line 4 should be indented at the same level as line 3 ('for' below 'row'). Configure your editor to insert 4 spaces when you hit the tab key. Python code is best indented with 4 spaces.

Can you print the rows read by for row in reader so that we can see what python reads, or post a csv file?

nnsatpute · Dec-10-2018, 01:08 PM

Below is the code.
I tried with print row and observations are as below
1. When file is created on windows and save as csv output is as expected i.e
[Column_1 Varchar(20)]
[Column_2 Number(10,2)]
[Column_3 Decimal(4,1)]
and it reads the row correctly.

2. When same file is taken to Unix it is read as below
['Column_1', 'Varchar(20)']
['Column_2', 'Number(10', '2)']
['Column_3', 'Decimal(4', '1)']

There seems the problem for , (Comma )it is treating it as separate string.

import csv,os,sys

if len(sys.argv)<2:
         print ("\nUsage: csv2tbl.py path/datafile.csv (0,1,2,3 = column name format):")
         print ("\nFormat: 0 = TitleCasedWords")
         print ("        1 = Titlecased_Words_Underscored")
         print ("        2 = lowercase_words_underscored")
         print ("        3 = Words_underscored_only (leave case as in source)")
         sys.exit()
else:
         if len(sys.argv)==3:
                  dummy,schemaname, datafile, = sys.argv
                  namefmt = '0'
         else: dummy, datafile, namefmt = sys.argv


#outfile = os.path.basename(datafile)
filename = os.path.basename(datafile).split('.')[0]
outfile = os.path.dirname(datafile)  + filename + '.sql'

tblname = schemaname + '.' + filename


partition_param_1 = 'ingestion_year  int '
partition_param_2 = 'ingestion_month  int'
partition_param_3 = 'ingestion_day int'
partition_string = partition_param_1 + ',' + partition_param_2 + ',' + partition_param_3

row_format='org.apache.hadoop.hive.serde2.avro.AveroSerDe'
stored_as=''
output_format=''
location='/HADOOP/RAW/' + schemaname + '/' + tblname + '/GOOD'
table_properties= '/HADOOP/RAW/' + schemaname + '/' + tblname + '/GOOD'

    


sql = 'CREATE EXTERNAL TABLE %s\n(' % (tblname)
# Create list of column [names],[widths]
with open (datafile) as csvfile:
        reader = csv.reader(csvfile,dialect='excel')
        row = next(reader)
        for row in reader:
            print(row)
            sql = sql + (" ".join(row)) +  (",") + "\n"
        
sql= sql[:-2]

sql = sql + ') \n Partition By (' + partition_string +')'
sql = sql + ' \n ROW FORMAT SERDE (' + row_format +')'
sql = sql + ' \n STORED AS (' + stored_as +')'
sql = sql + ' \n OUTPUT FORMAT (' + stored_as +')'
sql = sql + ' \n LOCATION (' + location +')'
sql = sql + ' \n TABLE PROPERTIES (' + table_properties +')'


with  open(outfile,'w') as sqlfile:
    sqlfile.write(sql)

sqlfile.close

print ('%s created.' % (outfile))

**Gribouillis** · Dec-10-2018, 03:19 PM

You may need to adapt the csv reader's dialect. The question is what does the csv file contain exactly? Can you post a few lines of this file ?

woooee · (This post was last modified: Dec-10-2018, 04:36 PM by woooee.)

And print the sql variable. It looks like you are missing one closing paren. Note also that sqlfile is not an SQL file. If you can post a sample csv
(not necessary to use actual data), and say what you want to do, i.e. do you want to create and SQL table, and if so, what SQL engine are you using.

nnsatpute · Dec-11-2018, 04:36 AM

The file has below contents,

Column_1 Varchar(20)
Column_2 Number(10,2)
Column_3 Decimal(4,1)

**Gribouillis** · Dec-11-2018, 06:01 AM

If the file is tab-separated, you can try dialect=csv.excel_tab in the call to csv.reader(). If the values are merely separated by space characters, you can define your own dialect

class MyDialect(csv.Dialect):
    delimiter = ' '
    doublequote = True
    lineterminator = '\r\n'
    quotechar = '"'
    quoting = csv.MINIMAL
    skipinitialspace = False

Then use dialect=MyDialect in csv.reader() until it reads ['Column_2', 'Number(10, 2)']

nnsatpute · Dec-11-2018, 06:23 AM

Thank you For your reply.

I am creating the CSV file in unix with help of describer command (Hive Describe).
Unfortunately I can't paste exact file.
But when i am opening same file in Excel it is comma separated with complete row in one line of excel.

**Gribouillis** · (This post was last modified: Dec-11-2018, 06:35 AM by Gribouillis.)

nnsatpute Wrote:I can't paste exact file.

You can do this

with open(datafile) as csvfile:
    s = csvfile.read(4096)
    print(repr(s))

Then paste the output in the forum.

nnsatpute Wrote:But when i am opening same file in Excel it is comma separated

If the file is comma separated, there is a problem with Number(10,2) because the comma must not be interpreted as a separating comma. Normally csv files avoid this by quoting "Number(10,2)". You need to know which quoting strategy is used and declare this in the dialect.

nnsatpute · Dec-19-2018, 07:44 AM

with open(datafile) as csvfile:
    s = csvfile.read(4096)
    print(repr(s))

The output of above code looks like
column1,varchar\n column2,decimal(10,2)\n column3,number(4,3)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	problems with reading csv file.	MassiJames	3	647	Nov-16-2023, 03:41 PM Last Post: snippsat
	Reading a file name fron a folder on my desktop	Fiona	4	922	Aug-23-2023, 11:11 AM Last Post: Axel_Erfurt
	Reading data from excel file –> process it >>then write to another excel output file	Jennifer_Jone	0	1,113	Mar-14-2023, 07:59 PM Last Post: Jennifer_Jone
	Reading a file	JonWayn	3	1,105	Dec-30-2022, 10:18 AM Last Post: ibreeden
	Reading Specific Rows In a CSV File	finndude	3	992	Dec-13-2022, 03:19 PM Last Post: finndude
	Excel file reading problem	max70990	1	900	Dec-11-2022, 07:00 PM Last Post: deanhystad
	Replace columns indexes reading a XSLX file	Larry1888	2	996	Nov-18-2022, 10:16 PM Last Post: Pedroski55
	Failing reading a file and cannot exit it...	tester_V	8	1,838	Aug-19-2022, 10:27 PM Last Post: tester_V
	I have an issue with Netmiko Error reading SSH protocol banner	omarhegazy	2	3,578	May-16-2022, 06:05 PM Last Post: omarhegazy
	Reading .csv file	doug2019	4	1,710	Apr-29-2022, 09:55 PM Last Post: deanhystad

Issue with reading CSV file

User Panel Messages

Announcements