Python Forum
Using Python to connect to an XML ?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Using Python to connect to an XML ?
#1
Can I use Python to connect to an XML, and then based on the data and certain rules, create a spreadsheet similar to OpenOffice calc ? Or it doesn't have to specifically be a spreadsheet, but simply a GUI output containing the resultant data. The data in XML format is from a KMyMoney database. Can also export to an SQL format.

Have found a few basic guides, at https://www.pythonforbeginners.com/basic...-in-python and https://www.geeksforgeeks.org/reading-an...in-python/ , so obviously possible. The XML file is 10 Mb; I assume all of the file is read into memory. I only need to open it for read purposes.

As there is other data besides the KMyMoney , which needs to be modified from time to time, not sure if the XML format will do. That said, it is possible to merge two XML files together with Python - https://pypi.org/project/xmlmerge/
Reply
#2
I understand that KMyMoney can save its data in a SQLite 3 database. You could then export the database tables to spreadsheets by various means, this is a standard task.

Also, XML is not something you connect to. It is only a general data format that you can read and write. You can read an XML file using modules lxml or beautifulsoup.
jehoshua likes this post
Reply
#3
(Jun-27-2023, 07:25 AM)Gribouillis Wrote: I understand that KMyMoney can save its data in a SQLite 3 database. You could then export the database tables to spreadsheets by various means, this is a standard task.

Thanks for pointing out that feature of KMyMoney; I had only looked at exporting and 'save as' functions. That seems the best option for this, as there is additional data to the KMM data.

Thanks for the tips on reading XML files with lxml or beautifulsoup.
Reply
#4
I followed the guide at https://stackabuse.com/parsing-xml-with-...in-python/ , some test code

from bs4 import BeautifulSoup

with open('xmlfile', 'r') as f:
	file = f.read()

# 'xml' is the parser used. For html files, which BeautifulSoup is typically used for, it would be 'html.parser'.
soup = BeautifulSoup(file, 'xml')

payees = soup.find_all('PAYEE')
for payee in payees:
    print(payee.text)
but no output. It seems to be out putting a lot of blank lines though ? Sample data is of the format:

<PAYEE matchingenabled="0" email="" name="Transfers - inter account" reference="" id="P000001">
<ADDRESS street="" telephone="" state="" city="" postcode=""/>
</PAYEE>
Reply
#5
Now changed to

from bs4 import BeautifulSoup

with open('testxml.xml', 'r') as f:
	file = f.read()

# 'xml' is the parser used. For html files, which BeautifulSoup is typically used for, it would be 'html.parser'.
soup = BeautifulSoup(file, 'xml')
print(soup.get_text())          # produces blank o/p

for payee in soup.find_all('PAYEE'):
    print(payee.get('Cash'))    # all lines have 'None'
Either no o/p or the words 'None'. There is at least one dataset that should match

Quote: <PAYEE matchingenabled="0" email="" name="Cash" reference="" id="P000031">
<ADDRESS street="" telephone="" state="" city="" postcode=""/>
</PAYEE>
Reply
#6
(Jul-04-2023, 05:03 AM)jehoshua Wrote: Either no o/p or the words 'None'. There is at least one dataset that should match
They are only attributes is in PAYEE tag,and no text.
Demo on how to parse those.
from bs4 import BeautifulSoup

with open('testxml.xml', 'r') as f:
    file = f.read()

soup = BeautifulSoup(file, 'xml')
payee = soup.find('PAYEE')
# All attributes
>>> payee.attrs
{'email': '',
 'id': 'P000031',
 'matchingenabled': '0',
 'name': 'Cash',
 'reference': ''}

# Get a singel one 
>>> payee['name']
'Cash'

# Or
>>> payee.get('name')
'Cash'
jehoshua likes this post
Reply
#7
Thanks, I had to install BeautifulSoup4 even though I had BeautifulSoup installed. That works fine

from bs4 import BeautifulSoup

with open('testxml.xml', 'r') as f:
    file = f.read()

soup = BeautifulSoup(file, 'xml')
payee = soup.find('PAYEE')

print(payee.attrs)      ## All attributes
print(payee['name'])    # Get a single one
print(payee.get('name'))    # Or - get a single one

address = soup.find('ADDRESS')
print(address.attrs)
python3 soup2.py

Output:
{'matchingenabled': '0', 'email': '', 'name': 'Transfers - inter account', 'reference': '', 'id': 'P000001'} Transfers - inter account Transfers - inter account {'street': '', 'telephone': '', 'county': '', 'city': '', 'zipcode': ''}
I will need to read up on how to loop through the array "payee" and print all or filter some. Also to learn how to reference what I call a data subset. For example "ADDRESS" is a data subset (child) of both PAYEE and INSTITUTION (data 'parents'). Quite a bit of learning and reading up.

When I used to code with other programming languages, I found the learning curve not as steep if I can use a 'debug' to inspect arrays and variables, etc. I see Python has pdb.

I see there are some GUI ones also - https://stackoverflow.com/posts/4929267/revisions
Reply
#8
The data is of the format

>> <PAYEE matchingenabled="0" email="" name="Transfers - inter account" reference="" id="P000001">
<ADDRESS street="" telephone="" state="" city="" postcode=""/>
</PAYEE>

and this code

#!/usr/bin/python

from bs4 import BeautifulSoup

with open('testxml.xml', 'r') as f:
    file = f.read()

soup = BeautifulSoup(file, 'xml')

for tag in soup.find_all('PAYEE'):

    print(f'{tag.name}: {tag.get_text}')
displays all the payees, but it also outputs these 2 lines

>>PAYEE: <bound method PageElement.get_text of <PAYEE id="P000242"/>>
PAYEE: <bound method PageElement.get_text of <PAYEE id="P000344"/>>

at the end. This is caused by this type of data

>><REPORT investments="0" group="Transactions" type="querytable 1.15" rowtype="category" querycolumns="number,payee,tag,account" name="Transactions by Category (Customized)" comment="Custom Report" convertcurrency="1" loans="0" showcolumntotals="1" detail="all" includestransfers="0" skipZero="0" hidetransactions="0" favorite="0" datelock="userdefined" tax="0" id="R000023">
<PAYEE id="P000242"/>
<CATEGORY id="A000536"/>
<DATES to="2020-08-05" from="2001-01-01"/>
</REPORT>

so the code works, but includes PAYEE data I don't want. That PAYEE data is a child of another parent. Need to only parse the PAYEE data at a parent level. Have tried 'recursive=no', and different code to indicate exclude child and siblings, but only include parent, but it all gave syntax errors as I attempted to adjust the code to suit.

I simply need the 'find_all' for PAYEE , but only at a parent level. Wink
Reply
#9
I needed to specify 'parent'. On a find_all() it wouldn't work, but on a find() it worked

#!/usr/bin/python

from bs4 import BeautifulSoup

with open('testxml.xml', 'r') as f:
    file = f.read()

soup = BeautifulSoup(file, 'xml')

for tag in soup.find('PAYEE').parent:

    print(tag)

print('finished')
Reply
#10
Now needing to display the data from the XML to a GUI, the example at https://github.com/PySimpleGUI/PySimpleG...1624781486 will be of a great help. Of all the GUI's for Python, PySimpleGUI is by far the easiest to use. Much less code to work with and a ton of free demos at https://github.com/PySimpleGUI/PySimpleG...moPrograms
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  connect sql by python using txt. file dawid294 2 445 Jan-12-2024, 08:54 PM
Last Post: deanhystad
  How to Connect to PostgreSQL Through Jump Server and SSH Tunnel using Python? nishans 1 1,018 Jan-02-2024, 10:37 AM
Last Post: khanzain
  python connect to mssql wailoonho 7 1,616 Dec-07-2023, 02:06 AM
Last Post: wailoonho
  Trying to make a bot to connect on discord with Selenium Python johnsmith43 2 52,092 Mar-21-2022, 02:56 PM
Last Post: Cloudytechnical
  How to connect Mysql databse with python and VSCode IDE madhusoodhananER 1 8,738 Oct-31-2019, 10:15 AM
Last Post: Larz60+
  Connect a Teradata DB to Python OscarBoots 10 8,889 Jan-31-2019, 10:23 PM
Last Post: OscarBoots
  Python connect to Sybase IQ FORTITUDE 1 4,794 Jan-24-2019, 02:14 AM
Last Post: micseydel
  How to connect Atom and Python? Jack_Sparrow 1 3,758 May-01-2018, 10:53 AM
Last Post: Larz60+
  connect to remote database via python script sunstar20 5 5,366 Apr-23-2018, 11:05 AM
Last Post: Gribouillis
  Can anyone Please help on fixing this python code to connect and save things on sqlit Roscoes 2 2,921 Mar-06-2018, 04:48 AM
Last Post: Roscoes

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020