Python Forum
Creating a delimited file from DB Table
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Creating a delimited file from DB Table
#1
Hi there,

I am new to python (SAS background).
I have this requirement to create a PIPE delimited file from Netezza tables. I tried the following approach. However, nothing is being created. The tables are pretty huge in size (like 2 billion rows in some).

Please can you point out what am I doing wrong here.

import pyodbc
import csv
import sys

dbconn = <db connection > /* works fine */

dbcursor = connection.cursor() /* works fine */

dbcursor.execute("select column1, column2 from schema.table")
result=dbcursor .fetchall()
c = csv.writer(open("C:\location\output.dat","w+"), delimiter='|')
c.writerow(result)
c.close()
dbconn.close()
Reply
#2
The code runs for about 20 minutes and the file has 0 rows.
There is nothing printed to the screen (i am running it using Python IDLE (v3.6)
Reply
#3
Don't know much at all about Netezza, can you point to some docs?
There may be a better way to access other than ODBC
Reply
#4
The file path is the source of your problem - C:\location\output.dat. Backslash is an escape symbol, giving a special meaning to to following character, e.g. \n.

Either prefix the path with r to make it a raw string - r'C:\location\output.dat', or use double backslash. Replacing backslash with a slash will work too.

PS IDLE is one of the worst Python environments, I would recommend switching to Spyder or Jupyter
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#5
(Sep-15-2018, 02:23 PM)volcano63 Wrote: The file path is the source of your problem - C:\location\output.dat. Backslash is an escape symbol, giving a special meaning to to following character, e.g. \n.

Either prefix the path with r to make it a raw string - r'C:\location\output.dat', or use double backslash. Replacing backslash with a slash will work too.

PS IDLE is one of the worst Python environments, I would recommend switching to Spyder or Jupyter

Thanks much for the solution. The irony is, my code works for smaller tables just fine (like 100000 rows). The issue (if at all it is an issue) crops up as the table size increases.

I will try the 'r' in my code and see if it gives my any result set.
Reply
#6
I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.

dbconn = "db connection" 
 
dbcursor = connection.cursor()
 
dbcursor.execute("select column1, column2 from schema.table limit 5000000")
result=dbcursor .fetchall()
c = csv.writer(open(r'C:\location\output.dat',"w"), delimiter='|')
for row in result:
	c.writerow(row)
The code works fine for tables upto 5,000,000 rows, beyond that, i see the following error. The table that i am working has 21 million rows and i need them all. I also failed to insert the header row.


Error:
========= RESTART: C:\Users\bkumarm\Desktop\2018\Python\nzunload.py ========= Traceback (most recent call last): File "C:\Users\bkumarm\Desktop\2018\Python\nzunload.py", line 11, in <module> result=cur.fetchall() MemoryError
Reply
#7
(Sep-17-2018, 01:28 PM)anubhav2020 Wrote: I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.
[/python]

Looks like handling of backslash in strings has changed lately - I was not aware of that. It only breaks when backslash with the following character creates a defined escape sequence, e.g.
  • \n - line feed
  • \t - tabulation
  • \a - beep
  • \b - backspace
(for extended list - see table in Wiki)

Nevertheless, while that was not your problem, backslashes in string may cause an unintended behaviour, so - be careful with those. Pay attention how r prefix or double backslash change the string behaviour
Output:
In [111]: print('C:\location\noutput.dat', 'C:\location\\noutput.dat') C:\location output.dat C:\location\noutput.dat In [112]: print('C:\location\toutput.dat', 'C:\location\\toutput.dat') C:\location output.dat C:\location\toutput.dat In [113]: print('C:\location\aoutput.dat', r'C:\location\aoutput.dat') C:\locationoutput.dat C:\location\aoutput.dat In [114]: print('C:\location\boutput.dat', r'C:\location\boutput.dat') C:\locatiooutput.dat C:\location\boutput.dat In [115]: print('C:\location\x25output.dat', r'C:\location\x25output.dat') C:\location%output.dat C:\location\x25output.dat
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#8
(Sep-17-2018, 03:19 PM)volcano63 Wrote:
(Sep-17-2018, 01:28 PM)anubhav2020 Wrote: I tried the 'r' for raw string and did not make any difference. So I changed my code as follows.
[/python]

Looks like handling of backslash in strings has changed lately - I was not aware of that. It only breaks when backslash with the following character creates a defined escape sequence, e.g.
  • \n - line feed
  • \t - tabulation
  • \a - beep
  • \b - backspace
(for extended list - see table in Wiki)

Nevertheless, while that was not your problem, backslashes in string may cause an unintended behaviour, so - be careful with those. Pay attention how r prefix or double backslash change the string behaviour
Output:
In [111]: print('C:\location\noutput.dat', 'C:\location\\noutput.dat') C:\location output.dat C:\location\noutput.dat In [112]: print('C:\location\toutput.dat', 'C:\location\\toutput.dat') C:\location output.dat C:\location\toutput.dat In [113]: print('C:\location\aoutput.dat', r'C:\location\aoutput.dat') C:\locationoutput.dat C:\location\aoutput.dat In [114]: print('C:\location\boutput.dat', r'C:\location\boutput.dat') C:\locatiooutput.dat C:\location\boutput.dat In [115]: print('C:\location\x25output.dat', r'C:\location\x25output.dat') C:\location%output.dat C:\location\x25output.dat

After multiple attempts, it finally worked.
But the issue with large files running out of memory still persists. Any suggestions as to what cn be changed in the code!!
Reply
#9
why don't you use fetchmany instead of fetchall?
Test everything in a Python shell (iPython, Azure Notebook, etc.)
  • Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
  • Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
  • You posted a claim that something you did not test works? Be prepared to eat your hat.
Reply
#10
I have a DB Viewer at github
I use it to export Tables from DB to CSV (comma) or TSV (tab)

made with PyQt5
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Tab Delimited Strings? johnywhy 7 488 Jan-13-2024, 10:34 PM
Last Post: sgrey
  Help creating shell scrip for python file marciokoko 10 1,258 Sep-16-2023, 09:46 PM
Last Post: snippsat
  Creating csv files from Excel file azizrasul 40 5,329 Nov-03-2022, 08:33 PM
Last Post: azizrasul
  Use module docx to get text from a file with a table Pedroski55 8 5,808 Aug-30-2022, 10:52 PM
Last Post: Pedroski55
  Converted Pipe Delimited text file to CSV file atomxkai 4 6,843 Feb-11-2022, 12:38 AM
Last Post: atomxkai
  Creating file with images BobSmoss 1 1,350 Jan-08-2022, 08:46 PM
Last Post: snippsat
  dataframe write to tab delimited differs from Excel koh 0 1,959 Aug-01-2021, 02:46 AM
Last Post: koh
  Yahoo_fin, Pandas: how to convert data table structure in csv file detlefschmitt 14 7,559 Feb-15-2021, 12:58 PM
Last Post: detlefschmitt
  write mariadb table rows query to each file? shams 1 1,842 Feb-02-2021, 04:10 PM
Last Post: buran
  Creating csv file from another file pisike 0 1,570 Nov-24-2020, 02:02 PM
Last Post: pisike

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020