Python Forum
struggling with loop/webscrape
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
struggling with loop/webscrape
#1
Hi guys,

i made variable:

page = input("Paste link:         ") + str("&LISTpg=")
and then another one:

for i in range(1, (int(max_pages)+1)):
    page = page + str(i)
    r = requests.get(page, headers=headers, params=params)
    content = (r.text)
    soup = BeautifulSoup(content, 'html.parser')
My question is... is it possible to somehow add value so it would be

www.example.com&LISTpg=1
www.example.com&LISTpg=2
www.example.com&LISTpg=3

instead of
www.example.com&LISTpg=1
www.example.com&LISTpg=12
www.example.com&LISTpg=123
www.example.com&LISTpg=1234
Reply
#2
Maybe something along these lines:

>>> page = 'https://mysecretsite.com'
>>> for i in range(1, 4):
...     print(f'{page}#secret_anchor{i}')
... 
https://mysecretsite.com#secret_anchor1
https://mysecretsite.com#secret_anchor2
https://mysecretsite.com#secret_anchor3
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
Thanks but it doesn't work :(

what i am doing wrong?

for i in range(1, (int(max_pages)+1)):
    page = f'{page}#&LISTpg={i}'
    r = requests.get(page, headers=headers, params=params)
    content = (r.text)
    soup = BeautifulSoup(content, 'html.parser')
i get page + &LISTpg

"&LISTpg" is added to the end of sentence each time :|
Reply
#4
(Sep-26-2019, 02:28 PM)zarize Wrote: My question is... is it possible to somehow add value so it would be

www.example.com&LISTpg=1
www.example.com&LISTpg=2
www.example.com&LISTpg=3

instead of
www.example.com&LISTpg=1
www.example.com&LISTpg=12
www.example.com&LISTpg=123
www.example.com&LISTpg=1234

Your initial question was about correct numbering and LISTpg= was expected and desired result.

'Doesn't work' is not correct description of the situation. It does work.

If you need something else, rephrase your question.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
Thanks guys, i tried a bit and now i figured out what was wrong in my code :)

My code was adding "LISTpg" because i was trying to set variable from page again
As you can see i was using 'page' for both variables, that's why it wasn't working as intended.
page = 'https://mysecretsite.com'
for i in range(1, 4):
    page = f'{page}&LISTpg={i}'
That is correct version :P
page = 'https://mysecretsite.com'
for i in range(1, 4):
    page2 = f'{page}&LISTpg={i}'
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Webscrape from my webpage and store in database and send to grafana Armond 2 906 Jul-12-2023, 06:29 PM
Last Post: Armond
  Struggling with Juggling JSON Data SamWatt 7 1,884 May-09-2022, 02:49 AM
Last Post: snippsat
  Syntax errors: Struggling to setup enviroment and load packages AH56 5 2,780 Jun-30-2021, 01:01 PM
Last Post: AH56
  Struggling for the past hour to define function and call it back godlyredwall 2 2,216 Oct-29-2020, 02:45 PM
Last Post: deanhystad
  struggling with != statements CallumRoberts2004 2 1,538 Aug-18-2020, 03:01 PM
Last Post: GOTO10
  I’m Flat out struggling to understand list indexes gr3yali3n 7 2,902 Jul-20-2020, 07:18 PM
Last Post: princetonits
  Struggling with nested list gr3yali3n 3 2,301 Jul-09-2020, 05:30 PM
Last Post: DPaul
  Struggling to exit this while loop fatherted99 5 2,474 Feb-08-2020, 07:46 PM
Last Post: fatherted99
  Struggling with several while loops nsadams87xx 1 1,817 Nov-25-2019, 02:12 AM
Last Post: Larz60+
  Still struggling with np.where... pberrett 1 1,845 May-10-2019, 11:30 AM
Last Post: scidam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020