Oct-03-2018, 01:16 AM
I have a project I completed (mostly) in VBA, but don't think it's great for larger data sets. I'm thinking that maybe I should try to use Python for the 'engine' and just keep the VBA side for the UI and distribution. However, as seems to be the case with everything I try in python, I just can't make it work. Maybe if I can get past the first step, I'll be able to move forward on my own. So if any of you can assist, I'd certainly appreciate it.
For the first step, all I'm trying to do is import 2 word documents and remove the HTML/XML tags. I've tried https://www.tutorialspoint.com/python/py...cument.htm but can't get passed PIP INSTALL DOCX. I've tried Beautiful Soup 4 but get errors like " looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup." Then "'"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.' % markup)" and on and on it goes. At least half a dozen "this is easy and should work"
All I want to do is open two file sand remove the html/xml. Surely there has to be something out there that I can just plug the file path and names into and see the results?
Thanks for any assistance.
For the first step, all I'm trying to do is import 2 word documents and remove the HTML/XML tags. I've tried https://www.tutorialspoint.com/python/py...cument.htm but can't get passed PIP INSTALL DOCX. I've tried Beautiful Soup 4 but get errors like " looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup." Then "'"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.' % markup)" and on and on it goes. At least half a dozen "this is easy and should work"
All I want to do is open two file sand remove the html/xml. Surely there has to be something out there that I can just plug the file path and names into and see the results?
Thanks for any assistance.