Python Forum
Parsing Oasis Open Document format.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parsing Oasis Open Document format.
#1
I want to write Parser of Oasis document v1.2 for only tags and their explanation. I am parsing tags correctly but I can't parse links belongs to each tag. Maybe you can offer me another way to do that project. I will be grateful Angel

there is my code link but i know have missing parts.:

https://i.stack.imgur.com/qJjxo.png

and there is for document link:
http://docs.oasis-open.org/office/v1.2/o..._253892949
Reply
#2
Please, don't post images of code. Copy paste in python tags.
Please, use proper tags when post code, traceback, output, etc.
See BBcode help for more info.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
from bs4 import BeautifulSoup, SoupStrainer
import requests, re

def main():
    #request ile metin çekilir
    req = requests.get('http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#__RefHeading__1415340_253892949')
    soup = BeautifulSoup(req.content,"lxml")
    # '<a href="#__RefHeading__1419338_253892949">19.905 xhtml:about</a>''

    containers = soup.find_all(['tr','td'])

    filename = "basliklar.txt"
    f = open(filename, "w")

    headers = "baslik, link\n"
    f.write(headers)

    #başlık ve ona karşılık gelen veri çekilir.
    #tag'e karşılık bir veri yok!! tag = container.nextSibling.text
    for container in containers:
        if container.nextSibling == None:
            baslik = container.text
            f.write(baslik + "\n")
        else:
            links=([link.get('href')for link in soup.find_all('a')])
            print(links)
    f.close()

if __name__ == "__main__":
    main()
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020