Python Forum
Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
safe text to html
#1
i am looking for code (command, module, function, etc.) that take a string with one or more lines of text and converts it to simple html that will display the given text with the page where it is inserted. it needs to be "safe" so that given html instead of plain text it will end up with the given html just being displayed.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Looks like there's a few different ways to do it: https://wiki.python.org/moin/EscapingHtml

The cgi module has an escape function, which only operates on angle brackets and ampersands.
The html module was added in 3.2, and contains an escape function (but I don't think you like modules that new).
The xml.sax.saxutils module also contains an escape function, and html is basically xml anyway, so that should work for this.

From that, we can easily roll our own:
>>> def build_page(text):
...   from xml.sax.saxutils import escape
...   text = escape(text)
...   return "<html><body>{0}</body></html>".format(text)
...
>>> build_page("Hello world!  The following > should be escaped, but probably not this: (').  <bold>Not bolded</bold>")
"<html><body>Hello world!  The following &gt; should be escaped, but probably not this: (').  &lt;bold&gt;Not bolded&lt;/bold&gt;</body></html>"
>>>
Reply
#3
(Jul-05-2017, 02:32 PM)nilamo Wrote: The html module was added in 3.2, and contains an escape function (but I don't think you like modules that new).

So I looked at the source.  The html module's source is identical to the cgi module's escape, except that it defaults to escaping quote characters.  I suppose you could just copy-paste this function if you wanted it :p

https://github.com/python/cpython/blob/m...t__.py#L12 Wrote:
def escape(s, quote=True):
   """
   Replace special characters "&", "<" and ">" to HTML-safe sequences.
   If the optional flag quote is true (the default), the quotation mark
   characters, both double quote (") and single quote (') characters are also
   translated.
   """
   s = s.replace("&", "&amp;") # Must be done first!
   s = s.replace("<", "&lt;")
   s = s.replace(">", "&gt;")
   if quote:
       s = s.replace('"', "&quot;")
       s = s.replace('\'', "&#x27;")
   return s
Reply
#4
Jinja2 is great.
>>> import jinja2

>>> html = '<html><body>hello world</body></html>'
>>> j = jinja2.escape(html)
>>> j
Markup('&lt;html&gt;&lt;body&gt;hello world&lt;/body&gt;&lt;/html&gt;')

>>> str(jinja2.escape(j))
'&lt;html&gt;&lt;body&gt;hello world&lt;/body&gt;&lt;/html&gt;'
>>> j.unescape()
'<html><body>hello world</body></html>'
So a little power,showing a couple ways to use filter.
From server is sending value <b>make</b>
<div class="html">
 {% filter escape %}
    <div>This is how you {{ value }} a div</div>
  {% endfilter %} 
    <div>This is how you {{ value }} a div</div>  
</div>
Output:
<div>This is how you <b>make</b> a div</div> This is how you make a div
Now passing in a passing in a escaping filter |e there are several filter eg |safe.
<div class="html">
 {% filter escape %}
    <div>This is how you {{ value|e }} a div</div>
  {% endfilter %} 
    <div>This is how you {{ value|e }} a div</div>  
</div>
Output:
<div>This is how you &lt;b&gt;make&lt;/b&gt; a div</div> This is how you <b>make</b> a div
Reply
#5
txt2html
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#6
(Jul-05-2017, 04:50 PM)snippsat Wrote: Jinja2 is great.
I'll second jinja. It's exceptionally fast (not the fastest, but fast enough for almost all purposes), has pretty good syntax (but that's subjective, maybe you prefer how Mako looks), and supports inheritance (though any decent templating engine does).

If you need a template, though. If your goal is simply to wrap some text, it's a little bit overkill.
Reply
#7
i'm writing little wsgi scripts and one on the agenda is to display the environment.  some output in my first version was all mangled.  then i remember, the values need to be encoded into html.  i'll try again.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#8
here is a snippet of what i ended up coding:
   s = s.replace('&','&amp;')
   s = s.replace('"','&quot;')
   s = s.replace('<','&lt;')
   s = s.replace('>','&gt')
   for n in list(range(32))+[39,92]:
       s = s.replace(chr(n),'&#'+hex(n)[1:]+';')
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#9
(Jul-06-2017, 02:48 AM)Skaperen Wrote: here is a snippet of what i ended up coding:
The problem is that is not correct Wink
I you want to challenge yourself to write this,no problem.
Can use tool i show here to test code,
all that's need for code under is pip install Flask.
>>> from flask import Markup                               
                                                            
>>> s = '<html><body>hello world</body></html>'              
>>> s = s.replace('&','&amp;')                               
... s = s.replace('"','&quot;')                              
... s = s.replace('<','&lt;')                                
... s = s.replace('>','&gt')                                 
... for n in list(range(32))+[39,92]:                        
...     s = s.replace(chr(n),'&#'+hex(n)[1:]+';')            
                                                            
>>> s                                                        
'&lt;html&gt&lt;body&gthello world&lt;/body&gt&lt;/html&gt'  
>>> test = Markup(s) 
>>> test
Markup('&lt;html&gt&lt;body&gthello world&lt;/body&gt&lt;/html&gt')                                 
>>> test.unescape()                                          
'<html&gt<body&gthello world</body&gt</html&gt' 
test.unescape() should be '<html><body>hello world</body></html>'
With Jinja2 is battle proved  and tested,
company like Mozilla use it and of course all that use Flask.
>>> import jinja2

>>> s = '<html><body>hello world</body></html>'
>>> test = jinja2.escape(s)
>>> test
Markup('&lt;html&gt;&lt;body&gt;hello world&lt;/body&gt;&lt;/html&gt;')
>>> test.unescape()
'<html><body>hello world</body></html>'
Reply
#10
(Jul-06-2017, 02:48 AM)Skaperen Wrote: s = s.replace('>','&gt')

Html entities all end with a semicolon. :)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  make eval() safe Skaperen 5 2,615 Mar-24-2022, 05:47 PM
Last Post: Skaperen
  is this string shell quote safe? Skaperen 2 2,094 Feb-18-2020, 12:56 AM
Last Post: Skaperen
  after py2 EOL, is it safe to repoint python? Skaperen 6 3,226 Sep-14-2019, 10:37 AM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020