Posts: 14
Threads: 1
Joined: Mar 2020
Mar-07-2020, 02:16 AM
(This post was last modified: Mar-07-2020, 03:57 AM by Larz60+.)
I am new to forums an a newbie in programming. I am having problem with coverting a text file that contains a dictionary on each line. I have been partially successfull in creating dictionaries but some entries are problematic. The problem I am facing is elaborated
in the following link.
Any help is appreciated.
https://hastebin.com/caqirutugo.sql
If further informaton is needed, I can try posting the program and a shorter text file with data.
Thanks
Posts: 14
Threads: 1
Joined: Mar 2020
In response to moderator:
The program:
''' read firstnames file and convert to dictionaries
'''
FILENAME = "tstdic.txt"
infile = open(FILENAME, "r", encoding = "utf-8")
firstnames = []
for line in infile:
# Prints original line content
print(line)
# First eval(); prints type of output and output
d1 = eval(line)
print(type(d1), "\n ", d1)
# Second eval(); prints type of output and output
d2 = eval(d1)
print(type(d2), "\n ", d2) Output: "{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}"
<class 'str'>
{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}
<class 'dict'>
{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}
"{'voornaam': ""M'Hamed"", 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'}"
<class 'str'>
{'voornaam': M'Hamed, 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'}
Traceback (most recent call last):
File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/iiec_run.py", line 31, in <module>
start(fakepyfile,mainpyfile)
File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/iiec_run.py", line 30, in start
exec(open(mainpyfile).read(), __main__.__dict__)
File "<string>", line 19, in <module>
File "<string>", line 1
{'voornaam': M'Hamed, 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'}
^
SyntaxError: invalid syntax
[Program finished]
Don't know how to upload the text file from my tab:
contents of the file used here above:
testdic.txt
"{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}"
"{'voornaam': ""M'Hamed"", 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'}"
"{'voornaam': ""D'Angelo"", 'geslacht': 'M', 't8306': '46', 'n8589': '3', 'n9094': '5', 'n9599': '9', 'n0004': '24', 'p8589': '6', 'p9094': '9', 'p9599': '17', 'p0004': '46'}"
"{'voornaam': '"M'hamed"', 'geslacht': 'M', 't8306': '103', 'n8589': '36', 'n9094': '14', 'n9599': '7', 'n0004': '8', 'p8589': '67', 'p9094': '24', 'p9599': '13', 'p0004': '15'}"
Posts: 8,151
Threads: 160
Joined: Sep 2016
Mar-07-2020, 10:58 AM
(This post was last modified: Mar-07-2020, 10:59 AM by buran.)
This looks like poorly constructed json file. Or maybe failed attempt to write a csv file with csv.DictReader.
Where did it come from? Can you make changes to the format at the source of the file?
Posts: 12,022
Threads: 484
Joined: Sep 2016
Do not use eval see: https://nedbatchelder.com/blog/201206/ev...erous.html
it's not necessary.
The format of your data is acceptable as JSON format
therefore after reading line 9 in your script, you can convert directly to a dictionary by using json.loads(line)
but first strip linefeeds and any whitespace, also include an import statement at start for json
import json
...
for line in infile:
line = line.strip()
ldict = json.loads(line)
... ldict is now a dictionary
Posts: 8,151
Threads: 160
Joined: Sep 2016
Mar-07-2020, 11:20 AM
(This post was last modified: Mar-07-2020, 11:20 AM by buran.)
@Larz60+: no, it's not valid JSON object (which will be converted to python dict) even at line level - because of the single quotes and because of the double quotes enclosing each line. it will produce a string
line = '''"{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}"'''
print(line)
import json
json_data = json.loads(line)
print(json_data)
print(type(json_data)) Output: "{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}"
{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}
<class 'str'>
If the line in the file looked like this:
Output: {"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"}
then each line can be converted to dict with json, because it's a JSON object at line level
line = '{"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"}'
print(line)
import json
json_data = json.loads(line)
print(json_data)
print(type(json_data))
print(json_data['voornaam']) Output: {"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"}
{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}
<class 'dict'>
Thomas
But of course it's better to create valid json file or csv file (whatever the initial intention was)
Posts: 12,022
Threads: 484
Joined: Sep 2016
OK, Sorry, I jumped the gun,
My first test was:
import os
import json
import sys
os.chdir(os.path.abspath(os.path.dirname(__file__)))
FILENAME = "tstdic.txt"
infile = open(FILENAME, "r", encoding = "utf-8")
firstnames = []
for line in infile:
line = line.strip()
ldict = json.loads(line)
print(ldict)
sys.exit(0) which works (first line only)
had I not exited, it would have blown up on second line.
My posting was done before I read yours, (posted without reload)
I usually work with massively large amounts of data which has lead to the bad habit of terminating the process early.
Thus the erroneous reply.
Posts: 8,151
Threads: 160
Joined: Sep 2016
Mar-07-2020, 04:30 PM
(This post was last modified: Mar-07-2020, 04:31 PM by buran.)
Larz, it will work just fine for every line. The problem is json.loads(line) will produce str not dict . Test with print(type(ldict)) .
Actually my first thought was exactly the same like yours , then I notice the quotes...
Posts: 14
Threads: 1
Joined: Mar 2020
@ buran & Larz60+:
Thank you both for your input. One thing that is definitely clear, is to avoid use of eval(). I had read about the dangers but did not know of an alternative. Not familiar with 'json' files. will definitely look into it.
Using the suggestion, I was able to output tstdic.txt (and most of the orginal data file) as:
Output: {"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"}
format as suggested by buran, using:
line = line.strip().replace("'", '"').replace('"{', "{").replace('}"', "}") and as expected, d1 = json.loads(line) then created dictionary without using eval().
The problem with converting "voornaam" values containing an apostrophe, however, still remains. I can't seem to find a way to reformat these entries to conform to json or eval. For that reason may be I will keep the topic open.
Thanks again.
Posts: 8,151
Threads: 160
Joined: Sep 2016
So, you cannot change how the file is created in the first place?
Posts: 2,121
Threads: 10
Joined: May 2017
Mar-08-2020, 12:20 PM
(This post was last modified: Mar-08-2020, 12:20 PM by DeaD_EyE.)
Working with malformed json could be done with demjson.
I never used it, but tried the malformed string and it works.
Output: In [22]: demjson.decode(a)
Out[22]:
{'voornaam': 'Thomas',
'geslacht': 'M',
't8306': '26794',
'n8589': '4856',
'n9094': '6559',
'n9599': '6412',
'n0004': '5897',
'p8589': '8972',
'p9094': '11424',
'p9599': '11760',
'p0004': '11324'}
a was the first malformed json-string.
BTW: Sounds like Dutch
|