Python Forum

Full Version: problem coverting string data file to dictionary
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
I am new to forums an a newbie in programming. I am having problem with coverting a text file that contains a dictionary on each line. I have been partially successfull in creating dictionaries but some entries are problematic. The problem I am facing is elaborated
in the following link.

Any help is appreciated.

https://hastebin.com/caqirutugo.sql

If further informaton is needed, I can try posting the program and a shorter text file with data.

Thanks
In response to moderator:

The program:
''' read firstnames file and convert to dictionaries
'''
FILENAME = "tstdic.txt"

infile = open(FILENAME, "r", encoding = "utf-8")

firstnames = []

for line in infile:

	# Prints original line content
	print(line)
	
	# First eval(); prints type of output and output
	d1 = eval(line)
	print(type(d1), "\n ",  d1)
	
	# Second eval(); prints type of output and output
	d2 = eval(d1)
	print(type(d2), "\n ", d2)
Output:
"{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}" <class 'str'> {'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'} <class 'dict'> {'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'} "{'voornaam': ""M'Hamed"", 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'}" <class 'str'> {'voornaam': M'Hamed, 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'} Traceback (most recent call last): File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/iiec_run.py", line 31, in <module> start(fakepyfile,mainpyfile) File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/iiec_run.py", line 30, in start exec(open(mainpyfile).read(), __main__.__dict__) File "<string>", line 19, in <module> File "<string>", line 1 {'voornaam': M'Hamed, 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'} ^ SyntaxError: invalid syntax [Program finished]
Don't know how to upload the text file from my tab:

contents of the file used here above:

testdic.txt
"{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}"
"{'voornaam': ""M'Hamed"", 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'}"
"{'voornaam': ""D'Angelo"", 'geslacht': 'M', 't8306': '46', 'n8589': '3', 'n9094': '5', 'n9599': '9', 'n0004': '24', 'p8589': '6', 'p9094': '9', 'p9599': '17', 'p0004': '46'}"
"{'voornaam': '"M'hamed"', 'geslacht': 'M', 't8306': '103', 'n8589': '36', 'n9094': '14', 'n9599': '7', 'n0004': '8', 'p8589': '67', 'p9094': '24', 'p9599': '13', 'p0004': '15'}"
This looks like poorly constructed json file. Or maybe failed attempt to write a csv file with csv.DictReader.
Where did it come from? Can you make changes to the format at the source of the file?
Do not use eval see: https://nedbatchelder.com/blog/201206/ev...erous.html
it's not necessary.
The format of your data is acceptable as JSON format
therefore after reading line 9 in your script, you can convert directly to a dictionary by using json.loads(line)
but first strip linefeeds and any whitespace, also include an import statement at start for json

import json

...

for line in infile:
    line = line.strip()
    ldict = json.loads(line)
    ...
ldict is now a dictionary
@Larz60+: no, it's not valid JSON object (which will be converted to python dict) even at line level - because of the single quotes and because of the double quotes enclosing each line. it will produce a string

line = '''"{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}"'''
print(line)

import json

json_data = json.loads(line)
print(json_data)
print(type(json_data))
Output:
"{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}" {'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'} <class 'str'>
If the line in the file looked like this:
Output:
{"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"}
then each line can be converted to dict with json, because it's a JSON object at line level

line = '{"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"}'
print(line)

import json

json_data = json.loads(line)
print(json_data)
print(type(json_data))
print(json_data['voornaam'])
Output:
{"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"} {'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'} <class 'dict'> Thomas
But of course it's better to create valid json file or csv file (whatever the initial intention was)
OK, Sorry, I jumped the gun,
My first test was:
import os
import json
import sys

os.chdir(os.path.abspath(os.path.dirname(__file__)))
FILENAME = "tstdic.txt"
 
infile = open(FILENAME, "r", encoding = "utf-8")

firstnames = []
 
for line in infile:
    line = line.strip()
    ldict = json.loads(line)
    print(ldict)
    sys.exit(0)
which works (first line only)
had I not exited, it would have blown up on second line.
My posting was done before I read yours, (posted without reload)
I usually work with massively large amounts of data which has lead to the bad habit of terminating the process early.

Thus the erroneous reply.
Larz, it will work just fine for every line. The problem is json.loads(line) will produce str not dict. Test with print(type(ldict)).
Actually my first thought was exactly the same like yours , then I notice the quotes...
@buran & Larz60+:

Thank you both for your input. One thing that is definitely clear, is to avoid use of eval(). I had read about the dangers but did not know of an alternative. Not familiar with 'json' files. will definitely look into it.
Using the suggestion, I was able to output tstdic.txt (and most of the orginal data file) as:
Output:
{"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"}
format as suggested by buran, using:
line = line.strip().replace("'", '"').replace('"{', "{").replace('}"', "}")
and as expected,
d1 = json.loads(line)
then created dictionary without using eval().

The problem with converting "voornaam" values containing an apostrophe, however, still remains. I can't seem to find a way to reformat these entries to conform to json or eval. For that reason may be I will keep the topic open.

Thanks again.
So, you cannot change how the file is created in the first place?
Working with malformed json could be done with demjson.
I never used it, but tried the malformed string and it works.

Output:
In [22]: demjson.decode(a) Out[22]: {'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}
a was the first malformed json-string.
BTW: Sounds like Dutch
Pages: 1 2 3