Python Forum

Full Version: Parsing Oasis Open Document format.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I want to write Parser of Oasis document v1.2 for only tags and their explanation. I am parsing tags correctly but I can't parse links belongs to each tag. Maybe you can offer me another way to do that project. I will be grateful Angel

there is my code link but i know have missing parts.:

https://i.stack.imgur.com/qJjxo.png

and there is for document link:
http://docs.oasis-open.org/office/v1.2/o..._253892949
Please, don't post images of code. Copy paste in python tags.
Please, use proper tags when post code, traceback, output, etc.
See BBcode help for more info.
from bs4 import BeautifulSoup, SoupStrainer
import requests, re

def main():
    #request ile metin çekilir
    req = requests.get('http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#__RefHeading__1415340_253892949')
    soup = BeautifulSoup(req.content,"lxml")
    # '<a href="#__RefHeading__1419338_253892949">19.905 xhtml:about</a>''

    containers = soup.find_all(['tr','td'])

    filename = "basliklar.txt"
    f = open(filename, "w")

    headers = "baslik, link\n"
    f.write(headers)

    #başlık ve ona karşılık gelen veri çekilir.
    #tag'e karşılık bir veri yok!! tag = container.nextSibling.text
    for container in containers:
        if container.nextSibling == None:
            baslik = container.text
            f.write(baslik + "\n")
        else:
            links=([link.get('href')for link in soup.find_all('a')])
            print(links)
    f.close()

if __name__ == "__main__":
    main()