Python Forum

Full Version: Scraping from multiple URLS to print in a single line.
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi there,

Please forgive me if I have trouble explaining myself, i'm quite new to Python.

Basically I've been tasked with scraping some information from an atlassian site that prints out the plans in a project and the repository's, variables and stages of that plan. I've managed to do this but it prints out each segment individually because i'm pulling data from 4 different URLS one at a time.

The print looks like this:
"Plans"
"Repos"
"Variables"
"Stages"

I've been tasked for it to print like this:
"Plans","Repos","Variables","Stages"

My code is below, thank you.


import requests
from bs4 import BeautifulSoup

params = {
    'X-Atlassian-Token': 'no-check',
    'Accept':'application/json',
    'Content-Type':'application/x-www-form-urlencoded'
}

# List of Plans in a project.
r = requests.get(url='https://xxxx/project/viewProject.action?projectKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for td in soup.findAll('td'):
    if td.get_attribute_list(key='class')[0] == 'build':
        print(td.text)

# List of Repos in a plan.
r = requests.get(url='https://xxxx/chain/admin/config/editChainRepository.action?buildKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for h3 in soup.findAll('h3'):
    if h3.get_attribute_list(key='class')[0] == 'item-title':
        print(h3.text)

# List of Variables in a plan.
r = requests.get(url='https://xxxx/chain/admin/config/configureChainVariables.action?buildKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for td in soup.findAll('td'):
    if td.get_attribute_list(key='class')[0] == 'variable-key':
        print(td.text)

# List of Stages in a plan.
r = requests.get(url='https://xxxx/chain/admin/config/defaultStages.action?buildKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for span in soup.findAll('span'):
    if span.get_attribute_list(key='class')[0] == 'stage-name':
        print('\t'+span.text)
one way to do this is create an empty string before scraping:
combined_text = ''
then add word after each site scraped:
combined_text = f"{combined_text} {new_word}"
Thank you Larz60+

Do you mean something like the below? I'm unsure what to enter for "new_word"

Thank you
combined_text = ''

# List of Plans in a project.
r = requests.get(url='https://xxxx/project/viewProject.action?projectKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for td in soup.findAll('td'):
    if td.get_attribute_list(key='class')[0] == 'build':
        print(td.text)
        combined_text = f"{td.text}"

# List of Repos in a plan.
r = requests.get(url='https://xxxx/chain/admin/config/editChainRepository.action?buildKey=xxxx', params=params, auth=('xxxx', 'xxxx'), verify=False)
soup = BeautifulSoup(r.text, 'html.parser')
for h3 in soup.findAll('h3'):
    if h3.get_attribute_list(key='class')[0] == 'item-title':
        print(h3.text)
        combined_text = f"{td.text} {h3.text}"
I put each request into a function named plans, repos, variables and stages and tried the following, how ever it still prints out the same.

combine_text = f'{plans()}{repos()}{variables()}{stages()}'
print(combine_text)
Another option is to append values to list and print that list:

>>> lst = ["Plans","Repos","Variables","Stages"]                     
>>> print(*lst, sep=', ')                                            
Plans, Repos, Variables, Stages