Creating a delimited file from DB Table

anubhav2020 · (This post was last modified: Sep-14-2018, 03:19 PM by Larz60+.)

Hi there,

I am new to python (SAS background).
I have this requirement to create a PIPE delimited file from Netezza tables. I tried the following approach. However, nothing is being created. The tables are pretty huge in size (like 2 billion rows in some).

Please can you point out what am I doing wrong here.

import pyodbc
import csv
import sys

dbconn = <db connection > /* works fine */

dbcursor = connection.cursor() /* works fine */

dbcursor.execute("select column1, column2 from schema.table")
result=dbcursor .fetchall()
c = csv.writer(open("C:\location\output.dat","w+"), delimiter='|')
c.writerow(result)
c.close()
dbconn.close()

anubhav2020 · Sep-15-2018, 01:01 PM

The code runs for about 20 minutes and the file has 0 rows.
There is nothing printed to the screen (i am running it using Python IDLE (v3.6)

**Larz60+** · Sep-15-2018, 02:12 PM

Don't know much at all about Netezza, can you point to some docs?
There may be a better way to access other than ODBC

volcano63 · (This post was last modified: Sep-15-2018, 02:23 PM by volcano63.)

The file path is the source of your problem - C:\location\output.dat. Backslash is an escape symbol, giving a special meaning to to following character, e.g. \n.

Either prefix the path with r to make it a raw string - r'C:\location\output.dat', or use double backslash. Replacing backslash with a slash will work too.

PS IDLE is one of the worst Python environments, I would recommend switching to Spyder or Jupyter

anubhav2020 · Sep-16-2018, 05:53 PM

(Sep-15-2018, 02:23 PM)volcano63 Wrote: The file path is the source of your problem - C:\location\output.dat. Backslash is an escape symbol, giving a special meaning to to following character, e.g. \n.

Either prefix the path with r to make it a raw string - r'C:\location\output.dat', or use double backslash. Replacing backslash with a slash will work too.

PS IDLE is one of the worst Python environments, I would recommend switching to Spyder or Jupyter

Thanks much for the solution. The irony is, my code works for smaller tables just fine (like 100000 rows). The issue (if at all it is an issue) crops up as the table size increases.

I will try the 'r' in my code and see if it gives my any result set.

anubhav2020 · (This post was last modified: Sep-17-2018, 01:28 PM by anubhav2020.)

I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.

dbconn = "db connection" 
 
dbcursor = connection.cursor()
 
dbcursor.execute("select column1, column2 from schema.table limit 5000000")
result=dbcursor .fetchall()
c = csv.writer(open(r'C:\location\output.dat',"w"), delimiter='|')
for row in result:
	c.writerow(row)

The code works fine for tables upto 5,000,000 rows, beyond that, i see the following error. The table that i am working has 21 million rows and i need them all. I also failed to insert the header row.

Error:========= RESTART: C:\Users\bkumarm\Desktop\2018\Python\nzunload.py =========
Traceback (most recent call last):
  File "C:\Users\bkumarm\Desktop\2018\Python\nzunload.py", line 11, in <module>
    result=cur.fetchall()
MemoryError

volcano63 · Sep-17-2018, 03:19 PM

(Sep-17-2018, 01:28 PM)anubhav2020 Wrote: I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.
[/python]

Looks like handling of backslash in strings has changed lately - I was not aware of that. It only breaks when backslash with the following character creates a defined escape sequence, e.g.

\n - line feed
\t - tabulation
\a - beep
\b - backspace

(for extended list - see table in Wiki)

Nevertheless, while that was not your problem, backslashes in string may cause an unintended behaviour, so - be careful with those. Pay attention how r prefix or double backslash change the string behaviour

Output:In [111]: print('C:\location\noutput.dat', 'C:\location\\noutput.dat')
C:\location
output.dat C:\location\noutput.dat

In [112]: print('C:\location\toutput.dat', 'C:\location\\toutput.dat')
C:\location	output.dat C:\location\toutput.dat

In [113]: print('C:\location\aoutput.dat', r'C:\location\aoutput.dat')
C:\locationoutput.dat C:\location\aoutput.dat

In [114]: print('C:\location\boutput.dat', r'C:\location\boutput.dat')
C:\locatiooutput.dat C:\location\boutput.dat

In [115]: print('C:\location\x25output.dat', r'C:\location\x25output.dat')
C:\location%output.dat C:\location\x25output.dat

anubhav2020 · (This post was last modified: Sep-19-2018, 05:12 AM by anubhav2020.)

(Sep-17-2018, 03:19 PM)volcano63 Wrote:
(Sep-17-2018, 01:28 PM)anubhav2020 Wrote: I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.
[/python]

Looks like handling of backslash in strings has changed lately - I was not aware of that. It only breaks when backslash with the following character creates a defined escape sequence, e.g.
\n - line feed

\t - tabulation

\a - beep

\b - backspace

(for extended list - see table in Wiki)

Nevertheless, while that was not your problem, backslashes in string may cause an unintended behaviour, so - be careful with those. Pay attention how r prefix or double backslash change the string behaviour
Output:In [111]: print('C:\location\noutput.dat', 'C:\location\\noutput.dat')
C:\location
output.dat C:\location\noutput.dat

In [112]: print('C:\location\toutput.dat', 'C:\location\\toutput.dat')
C:\location	output.dat C:\location\toutput.dat

In [113]: print('C:\location\aoutput.dat', r'C:\location\aoutput.dat')
C:\locationoutput.dat C:\location\aoutput.dat

In [114]: print('C:\location\boutput.dat', r'C:\location\boutput.dat')
C:\locatiooutput.dat C:\location\boutput.dat

In [115]: print('C:\location\x25output.dat', r'C:\location\x25output.dat')
C:\location%output.dat C:\location\x25output.dat

After multiple attempts, it finally worked.
But the issue with large files running out of memory still persists. Any suggestions as to what cn be changed in the code!!

volcano63 · Sep-19-2018, 02:26 PM

why don't you use fetchmany instead of fetchall?

Axel_Erfurt · Sep-19-2018, 05:22 PM

I have a DB Viewer at github
I use it to export Tables from DB to CSV (comma) or TSV (tab)

made with PyQt5

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Tab Delimited Strings?	johnywhy	7	2,265	Jan-13-2024, 10:34 PM Last Post: sgrey
	Help creating shell scrip for python file	marciokoko	10	3,602	Sep-16-2023, 09:46 PM Last Post: snippsat
	Creating csv files from Excel file	azizrasul	40	12,953	Nov-03-2022, 08:33 PM Last Post: azizrasul
	Use module docx to get text from a file with a table	Pedroski55	8	19,822	Aug-30-2022, 10:52 PM Last Post: Pedroski55
	Converted Pipe Delimited text file to CSV file	atomxkai	4	11,281	Feb-11-2022, 12:38 AM Last Post: atomxkai
	Creating file with images	BobSmoss	1	2,080	Jan-08-2022, 08:46 PM Last Post: snippsat
	dataframe write to tab delimited differs from Excel	koh	0	2,649	Aug-01-2021, 02:46 AM Last Post: koh
	Yahoo_fin, Pandas: how to convert data table structure in csv file	detlefschmitt	14	11,012	Feb-15-2021, 12:58 PM Last Post: detlefschmitt
	write mariadb table rows query to each file?	shams	1	2,563	Feb-02-2021, 04:10 PM Last Post: buran
	Creating csv file from another file	pisike	0	2,076	Nov-24-2020, 02:02 PM Last Post: pisike

Creating a delimited file from DB Table

User Panel Messages

Announcements