Python Forum
Scraping from multiple URLS to print in a single line.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping from multiple URLS to print in a single line.
#1
Hi there,

Please forgive me if I have trouble explaining myself, i'm quite new to Python.

Basically I've been tasked with scraping some information from an atlassian site that prints out the plans in a project and the repository's, variables and stages of that plan. I've managed to do this but it prints out each segment individually because i'm pulling data from 4 different URLS one at a time.

The print looks like this:
"Plans"
"Repos"
"Variables"
"Stages"

I've been tasked for it to print like this:
"Plans","Repos","Variables","Stages"

My code is below, thank you.


import requests
from bs4 import BeautifulSoup

params = {
    'X-Atlassian-Token': 'no-check',
    'Accept':'application/json',
    'Content-Type':'application/x-www-form-urlencoded'
}

# List of Plans in a project.
r = requests.get(url='https://xxxx/project/viewProject.action?projectKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for td in soup.findAll('td'):
    if td.get_attribute_list(key='class')[0] == 'build':
        print(td.text)

# List of Repos in a plan.
r = requests.get(url='https://xxxx/chain/admin/config/editChainRepository.action?buildKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for h3 in soup.findAll('h3'):
    if h3.get_attribute_list(key='class')[0] == 'item-title':
        print(h3.text)

# List of Variables in a plan.
r = requests.get(url='https://xxxx/chain/admin/config/configureChainVariables.action?buildKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for td in soup.findAll('td'):
    if td.get_attribute_list(key='class')[0] == 'variable-key':
        print(td.text)

# List of Stages in a plan.
r = requests.get(url='https://xxxx/chain/admin/config/defaultStages.action?buildKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for span in soup.findAll('span'):
    if span.get_attribute_list(key='class')[0] == 'stage-name':
        print('\t'+span.text)
Reply
#2
one way to do this is create an empty string before scraping:
combined_text = ''
then add word after each site scraped:
combined_text = f"{combined_text} {new_word}"
Reply
#3
Thank you Larz60+

Do you mean something like the below? I'm unsure what to enter for "new_word"

Thank you
combined_text = ''

# List of Plans in a project.
r = requests.get(url='https://xxxx/project/viewProject.action?projectKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for td in soup.findAll('td'):
    if td.get_attribute_list(key='class')[0] == 'build':
        print(td.text)
        combined_text = f"{td.text}"

# List of Repos in a plan.
r = requests.get(url='https://xxxx/chain/admin/config/editChainRepository.action?buildKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for h3 in soup.findAll('h3'):
    if h3.get_attribute_list(key='class')[0] == 'item-title':
        print(h3.text)
        combined_text = f"{td.text} {h3.text}"
Reply
#4
I put each request into a function named plans, repos, variables and stages and tried the following, how ever it still prints out the same.

combine_text = f'{plans()}{repos()}{variables()}{stages()}'
print(combine_text)
Reply
#5
Another option is to append values to list and print that list:

>>> lst = ["Plans","Repos","Variables","Stages"]                     
>>> print(*lst, sep=', ')                                            
Plans, Repos, Variables, Stages
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Information Web-scraping, multiple webpages Pabloty92 1 1,274 Dec-28-2022, 02:09 PM
Last Post: Yoriz
  BeautifulSoup not parsing other URLs giddyhead 0 1,194 Feb-23-2022, 05:35 PM
Last Post: giddyhead
Thumbs Up Issue facing while scraping the data from different websites in single script. Balamani 1 2,116 Oct-20-2020, 09:56 AM
Last Post: Larz60+
  Need logic on how to scrap 100K URLs goodmind 2 2,616 Jun-29-2020, 09:53 AM
Last Post: goodmind
  scraping multiple pages from table bandar 1 2,686 Jun-27-2020, 10:43 PM
Last Post: Larz60+
  expecting value: line 1 column 1 (char 0) in print (r.json)) loutsi 3 7,648 Jun-05-2020, 08:38 PM
Last Post: nuffink
  Scraping Multiple Pages mbadatanut 1 4,219 May-08-2020, 02:30 AM
Last Post: Larz60+
  Scrape multiple urls LXML santdoyle 1 3,548 Oct-26-2019, 09:53 PM
Last Post: snippsat
  MaxRetryError while scraping a website multiple times kawasso 6 17,417 Aug-29-2019, 05:25 PM
Last Post: kawasso
  Need to Verify URLs; getting SSLError rahul_goswami 0 2,197 Aug-20-2019, 10:17 AM
Last Post: rahul_goswami

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020