Python Forum
type change of dbshelve key between Python 2 and 3
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
type change of dbshelve key between Python 2 and 3
#1
I have an old database style program that I am attempting to port from python 2 to python 3. The program uses dbshelve to store about ten strings under each key. The first string is the same as the key to make it easier to match up printed data to the entry it came from.
The problem I am having is that the keys are strings in python 2, but are binary values in python 3. This breaks some existing code, and may possibly cause some problems if I add new entries after the port is complete.

The python 2 version uses
from bsddb import dbshelve
while the python 3 version uses
from bsddb3 import dbshelve
since I believe the bsddb3 package is expected to be fully compatible with the older bsddb package.

This is output in python2:
Output:
>>> dbase = dbshelve.open(dbpath) >>> keylist = dbase.keys() >>> discdata=dbase[keylist[0]] >>> discdata["id"] '0x2044301' >>> keylist[0] '0x2044301'
That "id" field is a hash of an optical disc, and the keylist for the entry is a string that is the same value as the "id" field.

This is the output in python3:
Output:
>>> keylist[0] b'0x2044301' >>> discdata["id"] '0x2044301'
Note the 'b' prepended to the beginning of the keylist[0] value.
Python2:
Output:
>>> type(keylist[0]) <type 'str'> >>> type(discdata["id"]) <type 'str'>
Python3:
Output:
>>> type(keylist[0]) <class 'bytes'> >>> type(discdata["id"]) <class 'str'>
The keylist[0] "bytes" value and the discdata["id"] "str" value are displayed the same (that is printed using the same characters), so I think I should be able to convert, but it is surprising that
keylist[0] does not return a string, which was the case in previous versions.

I do not get any errors or warnings when importing from bsddb3, but the bsddb3 web page indicates it is deprecated, and should be replaced by the berkeleydb package instead. In berkeleydb the keys are indeed bytes rather than strings:
"Take note that upgrading to berkeleydb is easy but not transparent. Notably, keys and values are bytes in berkeleydb lib, while in ‘bsddb3’ they are strings. You would only need to change your code to add or remove type encoding/decoding, depending of how you use the library. The process should be simple, nevertheless."
That note is from this page:
https://pypi.org/project/bsddb3/

That to me indicates that the keys should still be interpreted as strings when using the bsddb3 package.
When attempting to check whether a key exists using a string for a key, the dbshelve from bsddb3 has an error which definitely indicates the key is not expected to be a string, in contradiction to the documentation:
Output:
>>> dbase.has_key(keystr) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Bytes or Integer object expected for key, str found
My first guess is that I should just accept that the documentation is not correct, or that the Fedora 33 package I am using has misconfigured something at build time. The distribution package is python3-bsddb3-6.2.9-1.fc33.x86_64.
Does that seem reasonable? Just use encode() and decode() to go between UTF-8 strings and byte values when dealing with the key values?
The hash value being used as a key starts out as an integer to begin with, so it is actually a more natural approach as long as there are no problems with some values not converting correctly.

Since I am working on porting the code, I would be open to advice to scrap bsddb entirely, and write a helper program to convert the existing db to a different engine, such as sqlite, dump using json, or some other option. Any advice, or any known resources that have good advice on choosing a small db engine?
Reply
#2
If the package is not maintained and the documentation is ported from what used to be python2 only, then this makes a bit of sense. Under python2, str and bytes were equivalent, and str could hold any bytestring. Now str is expected to be unicode characters, which might not be how the DB was stored. So returning a bytestring seems the natural way for this to port to python3.
Reply
#3
(Feb-01-2021, 11:49 PM)ccaudle Wrote: Does that seem reasonable? Just use encode() and decode() to go between UTF-8 strings and byte values when dealing with the key values?
Yes that's fine,Python 3 all that comes will be bytes if no encoding is specified.
>>> s = b'0x2044301'
>>> type(s)
<class 'bytes'>
>>> s = s.decode() # Same as s.decode('utf-8')
>>> s
'0x2044301'
>>> type(s)
<class 'str'> 
Any advice, or any known resources that have good advice on choosing a small db engine?
TinyDB, dataset
My test of dataset.
Reply
#4
(Feb-02-2021, 12:40 AM)snippsat Wrote: Yes that's fine,Python 3 all that comes will be bytes if no encoding is specified.

Thanks for the confirmation, I was able to understand what is going on and get the old database file loading with Python 3.9.
Quote:TinyDB, dataset
My test of dataset.

Thanks for those, I was not familiar with either previously.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question dict value, how to change type from int to list? swissjoker 3 2,740 Dec-09-2020, 09:50 AM
Last Post: perfringo
  Python - change variable type during program execution ple 1 2,371 Apr-12-2020, 08:43 AM
Last Post: buran
  Type hinting - return type based on parameter micseydel 2 2,473 Jan-14-2020, 01:20 AM
Last Post: micseydel
  Change Column Type Talch 0 2,119 Aug-16-2018, 03:02 PM
Last Post: Talch
  Change type of elements in a list by column tkj80 7 6,828 Jan-04-2017, 11:15 PM
Last Post: tkj80

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020