Python Forum
Thread Rating:
  • 1 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Issue with reading CSV file
#1
Hello,

I am creating CSV file in unix with help of sed command.
File is getting created correctly as below:
Output:
Column_1 Nunber(10,1) Column_1 Decimal(8,2) Column_1 Varchar
I am trying to read this file row by row in python as below
with open (datafile) as csvfile:
        reader = csv.reader(csvfile,dialect='excel')
        row = next(reader)
            for row in reader:
            sql = sql + (" ".join(row)) +  (",") + "\n"
        
sql= sql[:-2]
when i am trying to put this row in sql file the comma in datatype is missing like below
Output:
Column_1 Nunber(10 1) Column_1 Decimal(8 2)
Is there any reason for this ?
Reply
#2
Line 4 should be indented at the same level as line 3 ('for' below 'row'). Configure your editor to insert 4 spaces when you hit the tab key. Python code is best indented with 4 spaces.

Can you print the rows read by for row in reader so that we can see what python reads, or post a csv file?
Reply
#3
Below is the code.
I tried with print row and observations are as below
1. When file is created on windows and save as csv output is as expected i.e
[Column_1 Varchar(20)]
[Column_2 Number(10,2)]
[Column_3 Decimal(4,1)]
and it reads the row correctly.

2. When same file is taken to Unix it is read as below
['Column_1', 'Varchar(20)']
['Column_2', 'Number(10', '2)']
['Column_3', 'Decimal(4', '1)']

There seems the problem for , (Comma )it is treating it as separate string.

import csv,os,sys

if len(sys.argv)<2:
         print ("\nUsage: csv2tbl.py path/datafile.csv (0,1,2,3 = column name format):")
         print ("\nFormat: 0 = TitleCasedWords")
         print ("        1 = Titlecased_Words_Underscored")
         print ("        2 = lowercase_words_underscored")
         print ("        3 = Words_underscored_only (leave case as in source)")
         sys.exit()
else:
         if len(sys.argv)==3:
                  dummy,schemaname, datafile, = sys.argv
                  namefmt = '0'
         else: dummy, datafile, namefmt = sys.argv


#outfile = os.path.basename(datafile)
filename = os.path.basename(datafile).split('.')[0]
outfile = os.path.dirname(datafile)  + filename + '.sql'

tblname = schemaname + '.' + filename


partition_param_1 = 'ingestion_year  int '
partition_param_2 = 'ingestion_month  int'
partition_param_3 = 'ingestion_day int'
partition_string = partition_param_1 + ',' + partition_param_2 + ',' + partition_param_3

row_format='org.apache.hadoop.hive.serde2.avro.AveroSerDe'
stored_as=''
output_format=''
location='/HADOOP/RAW/' + schemaname + '/' + tblname + '/GOOD'
table_properties= '/HADOOP/RAW/' + schemaname + '/' + tblname + '/GOOD'

    


sql = 'CREATE EXTERNAL TABLE %s\n(' % (tblname)
# Create list of column [names],[widths]
with open (datafile) as csvfile:
        reader = csv.reader(csvfile,dialect='excel')
        row = next(reader)
        for row in reader:
            print(row)
            sql = sql + (" ".join(row)) +  (",") + "\n"
        
sql= sql[:-2]

sql = sql + ') \n Partition By (' + partition_string +')'
sql = sql + ' \n ROW FORMAT SERDE (' + row_format +')'
sql = sql + ' \n STORED AS (' + stored_as +')'
sql = sql + ' \n OUTPUT FORMAT (' + stored_as +')'
sql = sql + ' \n LOCATION (' + location +')'
sql = sql + ' \n TABLE PROPERTIES (' + table_properties +')'


with  open(outfile,'w') as sqlfile:
    sqlfile.write(sql)

sqlfile.close

print ('%s created.' % (outfile))
Reply
#4
You may need to adapt the csv reader's dialect. The question is what does the csv file contain exactly? Can you post a few lines of this file ?
Reply
#5
And print the sql variable. It looks like you are missing one closing paren. Note also that sqlfile is not an SQL file. If you can post a sample csv
(not necessary to use actual data), and say what you want to do, i.e. do you want to create and SQL table, and if so, what SQL engine are you using.
Reply
#6
The file has below contents,

Column_1 Varchar(20)
Column_2 Number(10,2)
Column_3 Decimal(4,1)
Reply
#7
If the file is tab-separated, you can try dialect=csv.excel_tab in the call to csv.reader(). If the values are merely separated by space characters, you can define your own dialect
class MyDialect(csv.Dialect):
    delimiter = ' '
    doublequote = True
    lineterminator = '\r\n'
    quotechar = '"'
    quoting = csv.MINIMAL
    skipinitialspace = False
Then use dialect=MyDialect in csv.reader() until it reads ['Column_2', 'Number(10, 2)']
Reply
#8
Thank you For your reply.

I am creating the CSV file in unix with help of describer command (Hive Describe).
Unfortunately I can't paste exact file.
But when i am opening same file in Excel it is comma separated with complete row in one line of excel.
Reply
#9
nnsatpute Wrote:I can't paste exact file.
You can do this
with open(datafile) as csvfile:
    s = csvfile.read(4096)
    print(repr(s))
Then paste the output in the forum.

nnsatpute Wrote:But when i am opening same file in Excel it is comma separated
If the file is comma separated, there is a problem with Number(10,2) because the comma must not be interpreted as a separating comma. Normally csv files avoid this by quoting "Number(10,2)". You need to know which quoting strategy is used and declare this in the dialect.
Reply
#10
with open(datafile) as csvfile:
    s = csvfile.read(4096)
    print(repr(s))
The output of above code looks like
column1,varchar\n column2,decimal(10,2)\n column3,number(4,3)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Sad problems with reading csv file. MassiJames 3 647 Nov-16-2023, 03:41 PM
Last Post: snippsat
  Reading a file name fron a folder on my desktop Fiona 4 922 Aug-23-2023, 11:11 AM
Last Post: Axel_Erfurt
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,113 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Reading a file JonWayn 3 1,105 Dec-30-2022, 10:18 AM
Last Post: ibreeden
  Reading Specific Rows In a CSV File finndude 3 992 Dec-13-2022, 03:19 PM
Last Post: finndude
  Excel file reading problem max70990 1 900 Dec-11-2022, 07:00 PM
Last Post: deanhystad
  Replace columns indexes reading a XSLX file Larry1888 2 996 Nov-18-2022, 10:16 PM
Last Post: Pedroski55
  Failing reading a file and cannot exit it... tester_V 8 1,838 Aug-19-2022, 10:27 PM
Last Post: tester_V
  I have an issue with Netmiko Error reading SSH protocol banner omarhegazy 2 3,578 May-16-2022, 06:05 PM
Last Post: omarhegazy
  Reading .csv file doug2019 4 1,710 Apr-29-2022, 09:55 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020