Python Forum
problem coverting string data file to dictionary
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
problem coverting string data file to dictionary
#1
I am new to forums an a newbie in programming. I am having problem with coverting a text file that contains a dictionary on each line. I have been partially successfull in creating dictionaries but some entries are problematic. The problem I am facing is elaborated
in the following link.

Any help is appreciated.

https://hastebin.com/caqirutugo.sql

If further informaton is needed, I can try posting the program and a shorter text file with data.

Thanks
Reply
#2
In response to moderator:

The program:
''' read firstnames file and convert to dictionaries
'''
FILENAME = "tstdic.txt"

infile = open(FILENAME, "r", encoding = "utf-8")

firstnames = []

for line in infile:

	# Prints original line content
	print(line)
	
	# First eval(); prints type of output and output
	d1 = eval(line)
	print(type(d1), "\n ",  d1)
	
	# Second eval(); prints type of output and output
	d2 = eval(d1)
	print(type(d2), "\n ", d2)
Output:
"{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}" <class 'str'> {'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'} <class 'dict'> {'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'} "{'voornaam': ""M'Hamed"", 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'}" <class 'str'> {'voornaam': M'Hamed, 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'} Traceback (most recent call last): File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/iiec_run.py", line 31, in <module> start(fakepyfile,mainpyfile) File "/data/user/0/ru.iiec.pydroid3/files/accomp_files/iiec_run/iiec_run.py", line 30, in start exec(open(mainpyfile).read(), __main__.__dict__) File "<string>", line 19, in <module> File "<string>", line 1 {'voornaam': M'Hamed, 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'} ^ SyntaxError: invalid syntax [Program finished]
Don't know how to upload the text file from my tab:

contents of the file used here above:

testdic.txt
"{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}"
"{'voornaam': ""M'Hamed"", 'geslacht': 'M', 't8306': '46', 'n8589': '21', 'n9094': '8', 'n9599': '4', 'n0004': '0', 'p8589': '39', 'p9094': '14', 'p9599': '7', 'p0004': '0'}"
"{'voornaam': ""D'Angelo"", 'geslacht': 'M', 't8306': '46', 'n8589': '3', 'n9094': '5', 'n9599': '9', 'n0004': '24', 'p8589': '6', 'p9094': '9', 'p9599': '17', 'p0004': '46'}"
"{'voornaam': '"M'hamed"', 'geslacht': 'M', 't8306': '103', 'n8589': '36', 'n9094': '14', 'n9599': '7', 'n0004': '8', 'p8589': '67', 'p9094': '24', 'p9599': '13', 'p0004': '15'}"
Reply
#3
This looks like poorly constructed json file. Or maybe failed attempt to write a csv file with csv.DictReader.
Where did it come from? Can you make changes to the format at the source of the file?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#4
Do not use eval see: https://nedbatchelder.com/blog/201206/ev...erous.html
it's not necessary.
The format of your data is acceptable as JSON format
therefore after reading line 9 in your script, you can convert directly to a dictionary by using json.loads(line)
but first strip linefeeds and any whitespace, also include an import statement at start for json

import json

...

for line in infile:
    line = line.strip()
    ldict = json.loads(line)
    ...
ldict is now a dictionary
Reply
#5
@Larz60+: no, it's not valid JSON object (which will be converted to python dict) even at line level - because of the single quotes and because of the double quotes enclosing each line. it will produce a string

line = '''"{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}"'''
print(line)

import json

json_data = json.loads(line)
print(json_data)
print(type(json_data))
Output:
"{'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}" {'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'} <class 'str'>
If the line in the file looked like this:
Output:
{"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"}
then each line can be converted to dict with json, because it's a JSON object at line level

line = '{"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"}'
print(line)

import json

json_data = json.loads(line)
print(json_data)
print(type(json_data))
print(json_data['voornaam'])
Output:
{"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"} {'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'} <class 'dict'> Thomas
But of course it's better to create valid json file or csv file (whatever the initial intention was)
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#6
OK, Sorry, I jumped the gun,
My first test was:
import os
import json
import sys

os.chdir(os.path.abspath(os.path.dirname(__file__)))
FILENAME = "tstdic.txt"
 
infile = open(FILENAME, "r", encoding = "utf-8")

firstnames = []
 
for line in infile:
    line = line.strip()
    ldict = json.loads(line)
    print(ldict)
    sys.exit(0)
which works (first line only)
had I not exited, it would have blown up on second line.
My posting was done before I read yours, (posted without reload)
I usually work with massively large amounts of data which has lead to the bad habit of terminating the process early.

Thus the erroneous reply.
Reply
#7
Larz, it will work just fine for every line. The problem is json.loads(line) will produce str not dict. Test with print(type(ldict)).
Actually my first thought was exactly the same like yours , then I notice the quotes...
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#8
@buran & Larz60+:

Thank you both for your input. One thing that is definitely clear, is to avoid use of eval(). I had read about the dangers but did not know of an alternative. Not familiar with 'json' files. will definitely look into it.
Using the suggestion, I was able to output tstdic.txt (and most of the orginal data file) as:
Output:
{"voornaam": "Thomas", "geslacht": "M", "t8306": "26794", "n8589": "4856", "n9094": "6559", "n9599": "6412", "n0004": "5897", "p8589": "8972", "p9094": "11424", "p9599": "11760", "p0004": "11324"}
format as suggested by buran, using:
line = line.strip().replace("'", '"').replace('"{', "{").replace('}"', "}")
and as expected,
d1 = json.loads(line)
then created dictionary without using eval().

The problem with converting "voornaam" values containing an apostrophe, however, still remains. I can't seem to find a way to reformat these entries to conform to json or eval. For that reason may be I will keep the topic open.

Thanks again.
Reply
#9
So, you cannot change how the file is created in the first place?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#10
Working with malformed json could be done with demjson.
I never used it, but tried the malformed string and it works.

Output:
In [22]: demjson.decode(a) Out[22]: {'voornaam': 'Thomas', 'geslacht': 'M', 't8306': '26794', 'n8589': '4856', 'n9094': '6559', 'n9599': '6412', 'n0004': '5897', 'p8589': '8972', 'p9094': '11424', 'p9599': '11760', 'p0004': '11324'}
a was the first malformed json-string.
BTW: Sounds like Dutch
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Matching Data - Help - Dictionary manuel174102 1 354 Feb-02-2024, 04:47 PM
Last Post: deanhystad
  Need to replace a string with a file (HTML file) tester_V 1 699 Aug-30-2023, 03:42 AM
Last Post: Larz60+
  Convert string to float problem vasik006 8 3,269 Jun-03-2022, 06:41 PM
Last Post: deanhystad
  Converting '1a2b3c' string to Dictionary PythonNoobLvl1 6 1,779 May-13-2022, 03:44 PM
Last Post: deanhystad
  [SOLVED] Concat data from dictionary? Winfried 4 1,666 Mar-30-2022, 02:55 PM
Last Post: Winfried
Question How do I skipkeys on json file read to python dictionary? BrandonKastning 3 1,831 Mar-08-2022, 09:34 PM
Last Post: BrandonKastning
  trying to write a dictionary in a csv file CompleteNewb 13 6,381 Mar-04-2022, 04:43 AM
Last Post: deanhystad
  Python, how to manage multiple data in list or dictionary with calculations and FIFO Mikeardy 8 2,523 Dec-31-2021, 07:47 AM
Last Post: Mikeardy
  f string concatenation problem growSeb 3 2,212 Jun-28-2021, 05:00 AM
Last Post: buran
Question Problem with string and \n Falassion 6 2,616 Jun-15-2021, 03:59 PM
Last Post: Falassion

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020