Python Forum
xml element inside text node
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
xml element inside text node
#1
Hi,

I try to get this xml code…

<root>
<action>The guy enters the room and <bold>Slam!</bold> shuts the door.</action>
</root>

built from a python program using xml library (not lxml).

What I absolutely CANT figure out is how inserting the <bold> node INSIDE a text of the parent <action> node. I succeed to insert it but it falls at the end of the parent <action> node, not in its right place in the middle of the sentence.

Any idea ?

Alexandre
Reply
#2
If using internal XML library, be aware of the warning:

Quote:Warning

The XML modules are not secure against erroneous or maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities.

with that said, also be aware of:
Quote:These external packages are recommended for any code that parses untrusted XML data.

defusedxml is a pure Python package with modified subclasses of all stdlib XML parsers that prevent any potentially malicious operation. The package also ships with example exploits and extended documentation on more XML exploits like xpath injection

As to your question, do you have a URL we can visit to look at the file structure? You don't show enough of it to answer properly.
Reply
#3
Thanks,

actually, I'm just looking to achieve what I explained, a fonction that can insert a subelement in a middle of a text node.

I'm not concerned, I think, with malicious activities, as this code is not to be used by anyone but me. It won't be online, either.

To be more precise, I try to transform some markdown like this:

The guy enters the room and **Slam!** shuts the door.

into an xml file that would be this:

<root><action>The guy enters the room and <bold>Slam!</bold> shuts the door.</action></root>

Of course, it's not the real thing. The real thing is a complete 150 page script that I have to transform in xml via python and its xml.etree library. I succeed to every point of it and the result is almost done. I'm just stuck with this very peculiar problem: the node intricate into the text value of another one.
Reply
#4
Understood.

it could also be other tags like <em> or <span>
suggest looking at: https://github.com/trentm/python-markdown2
and seeing how they handle this problem.
Reply
#5
I think that, with XML, I’m totally free with the node names. This is not the case with HTML, as the tags are pre-defined.
Reply
#6
(Oct-21-2018, 01:37 PM)SgtPembry Wrote: built from a python program using xml library (not lxml).
Why not lxml?
lxml has E-factory for making this easier.
Example:
import lxml.etree
import lxml.builder

E = lxml.builder.ElementMaker()
ROOT = E.root
ACTION = E.action
FIELD1 = E.bold
FIELD2 = E.bold
the_doc = ROOT(
    ACTION(
        FIELD1("some value1", name="test"),
        FIELD2("some value2", name="car")
    )
)

print(lxml.etree.tostring(the_doc, encoding=str, pretty_print=True))
Output:
<root> <action> <bold name="test">some value1</bold> <bold name="car">some value2</bold> </action> </root>
Reply
#7
Thanks!

But I don't understand very well. Could that be possible to generate the xml I put as example? Could it insert a bold node inside an action element's text ?

In your example, there is no text in action.
Reply
#8
While looking for a solution… I think there might be something around .tail. I don't really understand the .tail principle and an answer could be there.

No?

OK. This works.

import xml.etree.ElementTree as ET
import re

action = ET.Element('act')
action.text = 'A guy enters the room and **Slam!** shuts the door.'

look = re.search('\*\*.+\*\*',action.text)

if look != None:
	t1 = action.text[0:look.span()[0]]
	t2 = action.text[look.span()[0]+2:look.span()[1]-2]
	t3 = action.text[look.span()[1]:]
	action.text = t1
	bold = ET.SubElement(action,'bold')
	bold.text = t2
	bold.tail = t3

print(ET.tostring(action))
If someone finds something to do better…
Reply
#9
SgtPembry Wrote:Could that be possible to generate the xml I put as example? Could it insert a bold node inside an action element's text ?
Like this,test it out with examples given and Doc.
import lxml.etree
import lxml.builder

E = lxml.builder.ElementMaker()
ROOT = E.root
ACTION = E.action
BOLD = E.bold
the_doc = ROOT(
        ACTION('The guy enters the room and', BOLD('Slam'), 'shuts the door.'),     
    )
print(lxml.etree.tostring(the_doc, encoding=str, pretty_print=True))
Output:
<root> <action>The guy enters the room and<bold>Slam</bold>shuts the door.</action> </root>
Reply
#10
Oh yeah… That's better.

lxml builder seems something I should work on.

Thank you very much.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Unable to locate element no such element gahhon 6 4,370 Feb-18-2019, 02:09 PM
Last Post: gahhon
  how to enter text in text box inside a program? Astronometria 0 2,376 Mar-10-2018, 11:54 PM
Last Post: Astronometria
  Change single element in 2D list changes every 1D element AceScottie 9 11,947 Nov-13-2017, 07:05 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020