Issue with reading CSV file

nnsatpute · Dec-10-2018, 01:08 PM

Below is the code.
I tried with print row and observations are as below
1. When file is created on windows and save as csv output is as expected i.e
[Column_1 Varchar(20)]
[Column_2 Number(10,2)]
[Column_3 Decimal(4,1)]
and it reads the row correctly.

2. When same file is taken to Unix it is read as below
['Column_1', 'Varchar(20)']
['Column_2', 'Number(10', '2)']
['Column_3', 'Decimal(4', '1)']

There seems the problem for , (Comma )it is treating it as separate string.

import csv,os,sys

if len(sys.argv)<2:
         print ("\nUsage: csv2tbl.py path/datafile.csv (0,1,2,3 = column name format):")
         print ("\nFormat: 0 = TitleCasedWords")
         print ("        1 = Titlecased_Words_Underscored")
         print ("        2 = lowercase_words_underscored")
         print ("        3 = Words_underscored_only (leave case as in source)")
         sys.exit()
else:
         if len(sys.argv)==3:
                  dummy,schemaname, datafile, = sys.argv
                  namefmt = '0'
         else: dummy, datafile, namefmt = sys.argv


#outfile = os.path.basename(datafile)
filename = os.path.basename(datafile).split('.')[0]
outfile = os.path.dirname(datafile)  + filename + '.sql'

tblname = schemaname + '.' + filename


partition_param_1 = 'ingestion_year  int '
partition_param_2 = 'ingestion_month  int'
partition_param_3 = 'ingestion_day int'
partition_string = partition_param_1 + ',' + partition_param_2 + ',' + partition_param_3

row_format='org.apache.hadoop.hive.serde2.avro.AveroSerDe'
stored_as=''
output_format=''
location='/HADOOP/RAW/' + schemaname + '/' + tblname + '/GOOD'
table_properties= '/HADOOP/RAW/' + schemaname + '/' + tblname + '/GOOD'

    


sql = 'CREATE EXTERNAL TABLE %s\n(' % (tblname)
# Create list of column [names],[widths]
with open (datafile) as csvfile:
        reader = csv.reader(csvfile,dialect='excel')
        row = next(reader)
        for row in reader:
            print(row)
            sql = sql + (" ".join(row)) +  (",") + "\n"
        
sql= sql[:-2]

sql = sql + ') \n Partition By (' + partition_string +')'
sql = sql + ' \n ROW FORMAT SERDE (' + row_format +')'
sql = sql + ' \n STORED AS (' + stored_as +')'
sql = sql + ' \n OUTPUT FORMAT (' + stored_as +')'
sql = sql + ' \n LOCATION (' + location +')'
sql = sql + ' \n TABLE PROPERTIES (' + table_properties +')'


with  open(outfile,'w') as sqlfile:
    sqlfile.write(sql)

sqlfile.close

print ('%s created.' % (outfile))

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	problems with reading csv file.	MassiJames	3	731	Nov-16-2023, 03:41 PM Last Post: snippsat
	Reading a file name fron a folder on my desktop	Fiona	4	1,008	Aug-23-2023, 11:11 AM Last Post: Axel_Erfurt
	Reading data from excel file –> process it >>then write to another excel output file	Jennifer_Jone	0	1,168	Mar-14-2023, 07:59 PM Last Post: Jennifer_Jone
	Reading a file	JonWayn	3	1,158	Dec-30-2022, 10:18 AM Last Post: ibreeden
	Reading Specific Rows In a CSV File	finndude	3	1,038	Dec-13-2022, 03:19 PM Last Post: finndude
	Excel file reading problem	max70990	1	931	Dec-11-2022, 07:00 PM Last Post: deanhystad
	Replace columns indexes reading a XSLX file	Larry1888	2	1,032	Nov-18-2022, 10:16 PM Last Post: Pedroski55
	Failing reading a file and cannot exit it...	tester_V	8	1,868	Aug-19-2022, 10:27 PM Last Post: tester_V
	I have an issue with Netmiko Error reading SSH protocol banner	omarhegazy	2	3,639	May-16-2022, 06:05 PM Last Post: omarhegazy
	Reading .csv file	doug2019	4	1,769	Apr-29-2022, 09:55 PM Last Post: deanhystad

Issue with reading CSV file

User Panel Messages

Announcements