Python Forum

Hi there,

I am new to python (SAS background).
I have this requirement to create a PIPE delimited file from Netezza tables. I tried the following approach. However, nothing is being created. The tables are pretty huge in size (like 2 billion rows in some).

Please can you point out what am I doing wrong here.

import pyodbc
import csv
import sys

dbconn = <db connection > /* works fine */

dbcursor = connection.cursor() /* works fine */

dbcursor.execute("select column1, column2 from schema.table")
result=dbcursor .fetchall()
c = csv.writer(open("C:\location\output.dat","w+"), delimiter='|')
c.writerow(result)
c.close()
dbconn.close()

The code runs for about 20 minutes and the file has 0 rows.
There is nothing printed to the screen (i am running it using Python IDLE (v3.6)

Don't know much at all about Netezza, can you point to some docs?
There may be a better way to access other than ODBC

The file path is the source of your problem - C:\location\output.dat. Backslash is an escape symbol, giving a special meaning to to following character, e.g. \n.

Either prefix the path with r to make it a raw string - r'C:\location\output.dat', or use double backslash. Replacing backslash with a slash will work too.

PS IDLE is one of the worst Python environments, I would recommend switching to Spyder or Jupyter

(Sep-15-2018, 02:23 PM)volcano63 Wrote: [ -> ]The file path is the source of your problem - C:\location\output.dat. Backslash is an escape symbol, giving a special meaning to to following character, e.g. \n.

Either prefix the path with r to make it a raw string - r'C:\location\output.dat', or use double backslash. Replacing backslash with a slash will work too.

PS IDLE is one of the worst Python environments, I would recommend switching to Spyder or Jupyter

Thanks much for the solution. The irony is, my code works for smaller tables just fine (like 100000 rows). The issue (if at all it is an issue) crops up as the table size increases.

I will try the 'r' in my code and see if it gives my any result set.

I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.

dbconn = "db connection" 
 
dbcursor = connection.cursor()
 
dbcursor.execute("select column1, column2 from schema.table limit 5000000")
result=dbcursor .fetchall()
c = csv.writer(open(r'C:\location\output.dat',"w"), delimiter='|')
for row in result:
	c.writerow(row)

The code works fine for tables upto 5,000,000 rows, beyond that, i see the following error. The table that i am working has 21 million rows and i need them all. I also failed to insert the header row.

Error:========= RESTART: C:\Users\bkumarm\Desktop\2018\Python\nzunload.py =========
Traceback (most recent call last):
  File "C:\Users\bkumarm\Desktop\2018\Python\nzunload.py", line 11, in <module>
    result=cur.fetchall()
MemoryError

(Sep-17-2018, 01:28 PM)anubhav2020 Wrote: [ -> ]I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.
[/python]

Looks like handling of backslash in strings has changed lately - I was not aware of that. It only breaks when backslash with the following character creates a defined escape sequence, e.g.

\n - line feed
\t - tabulation
\a - beep
\b - backspace

(for extended list - see table in Wiki)

Nevertheless, while that was not your problem, backslashes in string may cause an unintended behaviour, so - be careful with those. Pay attention how r prefix or double backslash change the string behaviour

Output:In [111]: print('C:\location\noutput.dat', 'C:\location\\noutput.dat')
C:\location
output.dat C:\location\noutput.dat

In [112]: print('C:\location\toutput.dat', 'C:\location\\toutput.dat')
C:\location	output.dat C:\location\toutput.dat

In [113]: print('C:\location\aoutput.dat', r'C:\location\aoutput.dat')
C:\locationoutput.dat C:\location\aoutput.dat

In [114]: print('C:\location\boutput.dat', r'C:\location\boutput.dat')
C:\locatiooutput.dat C:\location\boutput.dat

In [115]: print('C:\location\x25output.dat', r'C:\location\x25output.dat')
C:\location%output.dat C:\location\x25output.dat

(Sep-17-2018, 03:19 PM)volcano63 Wrote: [ -> ]
(Sep-17-2018, 01:28 PM)anubhav2020 Wrote: [ -> ]I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.
[/python]

Looks like handling of backslash in strings has changed lately - I was not aware of that. It only breaks when backslash with the following character creates a defined escape sequence, e.g.
\n - line feed

\t - tabulation

\a - beep

\b - backspace

(for extended list - see table in Wiki)

Nevertheless, while that was not your problem, backslashes in string may cause an unintended behaviour, so - be careful with those. Pay attention how r prefix or double backslash change the string behaviour
Output:In [111]: print('C:\location\noutput.dat', 'C:\location\\noutput.dat')
C:\location
output.dat C:\location\noutput.dat

In [112]: print('C:\location\toutput.dat', 'C:\location\\toutput.dat')
C:\location	output.dat C:\location\toutput.dat

In [113]: print('C:\location\aoutput.dat', r'C:\location\aoutput.dat')
C:\locationoutput.dat C:\location\aoutput.dat

In [114]: print('C:\location\boutput.dat', r'C:\location\boutput.dat')
C:\locatiooutput.dat C:\location\boutput.dat

In [115]: print('C:\location\x25output.dat', r'C:\location\x25output.dat')
C:\location%output.dat C:\location\x25output.dat

After multiple attempts, it finally worked.
But the issue with large files running out of memory still persists. Any suggestions as to what cn be changed in the code!!

why don't you use fetchmany instead of fetchall?

I have a DB Viewer at github
I use it to export Tables from DB to CSV (comma) or TSV (tab)

made with PyQt5

anubhav2020

anubhav2020

Larz60+

volcano63

anubhav2020

anubhav2020

volcano63

anubhav2020

volcano63

Axel_Erfurt