Posts: 5
Threads: 1
Joined: Sep 2018
Sep-14-2018, 02:53 PM
(This post was last modified: Sep-14-2018, 03:19 PM by Larz60+.)
Hi there,
I am new to python (SAS background).
I have this requirement to create a PIPE delimited file from Netezza tables. I tried the following approach. However, nothing is being created. The tables are pretty huge in size (like 2 billion rows in some).
Please can you point out what am I doing wrong here.
import pyodbc
import csv
import sys
dbconn = <db connection > /* works fine */
dbcursor = connection.cursor() /* works fine */
dbcursor.execute("select column1, column2 from schema.table")
result=dbcursor .fetchall()
c = csv.writer(open("C:\location\output.dat","w+"), delimiter='|')
c.writerow(result)
c.close()
dbconn.close()
Posts: 5
Threads: 1
Joined: Sep 2018
The code runs for about 20 minutes and the file has 0 rows.
There is nothing printed to the screen (i am running it using Python IDLE (v3.6)
Posts: 12,047
Threads: 487
Joined: Sep 2016
Don't know much at all about Netezza, can you point to some docs?
There may be a better way to access other than ODBC
Posts: 566
Threads: 10
Joined: Apr 2017
Sep-15-2018, 02:23 PM
(This post was last modified: Sep-15-2018, 02:23 PM by volcano63.)
The file path is the source of your problem - C:\location\output.dat . Backslash is an escape symbol, giving a special meaning to to following character, e.g. \n .
Either prefix the path with r to make it a raw string - r'C:\location\output.dat' , or use double backslash. Replacing backslash with a slash will work too.
PS IDLE is one of the worst Python environments, I would recommend switching to Spyder or Jupyter
Test everything in a Python shell (iPython, Azure Notebook, etc.) - Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
- Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
- You posted a claim that something you did not test works? Be prepared to eat your hat.
Posts: 5
Threads: 1
Joined: Sep 2018
(Sep-15-2018, 02:23 PM)volcano63 Wrote: The file path is the source of your problem - C:\location\output.dat . Backslash is an escape symbol, giving a special meaning to to following character, e.g. \n .
Either prefix the path with r to make it a raw string - r'C:\location\output.dat' , or use double backslash. Replacing backslash with a slash will work too.
PS IDLE is one of the worst Python environments, I would recommend switching to Spyder or Jupyter
Thanks much for the solution. The irony is, my code works for smaller tables just fine (like 100000 rows). The issue (if at all it is an issue) crops up as the table size increases.
I will try the 'r' in my code and see if it gives my any result set.
Posts: 5
Threads: 1
Joined: Sep 2018
Sep-17-2018, 01:28 PM
(This post was last modified: Sep-17-2018, 01:28 PM by anubhav2020.)
I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.
dbconn = "db connection"
dbcursor = connection.cursor()
dbcursor.execute("select column1, column2 from schema.table limit 5000000")
result=dbcursor .fetchall()
c = csv.writer(open(r'C:\location\output.dat',"w"), delimiter='|')
for row in result:
c.writerow(row) The code works fine for tables upto 5,000,000 rows, beyond that, i see the following error. The table that i am working has 21 million rows and i need them all. I also failed to insert the header row.
Error: ========= RESTART: C:\Users\bkumarm\Desktop\2018\Python\nzunload.py =========
Traceback (most recent call last):
File "C:\Users\bkumarm\Desktop\2018\Python\nzunload.py", line 11, in <module>
result=cur.fetchall()
MemoryError
Posts: 566
Threads: 10
Joined: Apr 2017
(Sep-17-2018, 01:28 PM)anubhav2020 Wrote: I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.
[/python]
Looks like handling of backslash in strings has changed lately - I was not aware of that. It only breaks when backslash with the following character creates a defined escape sequence, e.g. \n - line feed
\t - tabulation
\a - beep
\b - backspace
(for extended list - see table in Wiki)
Nevertheless, while that was not your problem, backslashes in string may cause an unintended behaviour, so - be careful with those. Pay attention how r prefix or double backslash change the string behaviour
Output: In [111]: print('C:\location\noutput.dat', 'C:\location\\noutput.dat')
C:\location
output.dat C:\location\noutput.dat
In [112]: print('C:\location\toutput.dat', 'C:\location\\toutput.dat')
C:\location output.dat C:\location\toutput.dat
In [113]: print('C:\location\aoutput.dat', r'C:\location\aoutput.dat')
C:\locationoutput.dat C:\location\aoutput.dat
In [114]: print('C:\location\boutput.dat', r'C:\location\boutput.dat')
C:\locatiooutput.dat C:\location\boutput.dat
In [115]: print('C:\location\x25output.dat', r'C:\location\x25output.dat')
C:\location%output.dat C:\location\x25output.dat
Test everything in a Python shell (iPython, Azure Notebook, etc.) - Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
- Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
- You posted a claim that something you did not test works? Be prepared to eat your hat.
Posts: 5
Threads: 1
Joined: Sep 2018
Sep-19-2018, 05:12 AM
(This post was last modified: Sep-19-2018, 05:12 AM by anubhav2020.)
(Sep-17-2018, 03:19 PM)volcano63 Wrote: (Sep-17-2018, 01:28 PM)anubhav2020 Wrote: I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.
[/python]
Looks like handling of backslash in strings has changed lately - I was not aware of that. It only breaks when backslash with the following character creates a defined escape sequence, e.g. \n - line feed
\t - tabulation
\a - beep
\b - backspace
(for extended list - see table in Wiki)
Nevertheless, while that was not your problem, backslashes in string may cause an unintended behaviour, so - be careful with those. Pay attention how r prefix or double backslash change the string behaviour
Output: In [111]: print('C:\location\noutput.dat', 'C:\location\\noutput.dat')
C:\location
output.dat C:\location\noutput.dat
In [112]: print('C:\location\toutput.dat', 'C:\location\\toutput.dat')
C:\location output.dat C:\location\toutput.dat
In [113]: print('C:\location\aoutput.dat', r'C:\location\aoutput.dat')
C:\locationoutput.dat C:\location\aoutput.dat
In [114]: print('C:\location\boutput.dat', r'C:\location\boutput.dat')
C:\locatiooutput.dat C:\location\boutput.dat
In [115]: print('C:\location\x25output.dat', r'C:\location\x25output.dat')
C:\location%output.dat C:\location\x25output.dat
After multiple attempts, it finally worked.
But the issue with large files running out of memory still persists. Any suggestions as to what cn be changed in the code!!
Posts: 566
Threads: 10
Joined: Apr 2017
why don't you use fetchmany instead of fetchall ?
Test everything in a Python shell (iPython, Azure Notebook, etc.) - Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
- Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
- You posted a claim that something you did not test works? Be prepared to eat your hat.
Posts: 1,035
Threads: 16
Joined: Dec 2016
I have a DB Viewer at github
I use it to export Tables from DB to CSV (comma) or TSV (tab)
made with PyQt5
|