Python Forum
Sitemap.xml and pull URLs and get response code
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Sitemap.xml and pull URLs and get response code
#1
What I want to do is enter a site's robots.txt list. It then pulls out all the URLs and contains their response code. Save this to csv.

I can do this to a point in a different way. But I can't save to csv and get the response code for each url individually.

Also, sitemaps of plugins like All in one SEO are different from the manually created Sitemap.


I use to code:

import requests
from bs4 import BeautifulSoup


d = open("sitemap.txt", "a+")
url = 'https://site.com/postsitemap.xml'
page = requests.get(url)
print('Sitemap yanıt kodu: %s' % page)

data = [[r["loc"], r["lastmod"]] for r in raw["urlset"]["url"]]
print("Sitemap URL sayısı:", len(data))
df = pd.DataFrame(data, columns=["links", "lastmod"])

sitemap_index = BeautifulSoup(page.content, 'html.parser')
print('Created %s object' % type(sitemap_index))
urls = [element.text for element in sitemap_index.findAll('loc')]


for link in sorted(x for x in (urls)):
    d.write(link+("\n"))

with open("sitemap.txt") as f:
    print(f.read())
    f.close()
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020