Python Forum
Unable to pull number using BeautifulSoup and Regex
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unable to pull number using BeautifulSoup and Regex
#1
Hello,

I'm new to python, I'm trying to pull the number from the document using BeautifulSoup.

Here is my code :

from textwrap import shorten
from bs4 import BeautifulSoup
import json
import requests
import re

url = 'https://m.propertyfinder.ae/en/rent/apartment-for-rent-dubai-dubai-marina-marina-promenade-delphine-tower-7276805.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
all_scripts = soup.find_all('script')
whatsapp_script = all_scripts[6]
whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script)

print(whatsapp.group())
I'm getting error like :

Error:
Traceback (most recent call last): File "/Users/evilslab/Documents/Websites/www.futurepoint.dev.cc/dobuyme/python/fetchFinder.py", line 12, in <module> whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/re.py", line 199, in search return _compile(pattern, flags).search(string) TypeError: expected string or bytes-like object
I'm trying to get the whatsapp number. How i can do that ?

Anyone here to help ?
Reply
#2
from textwrap import shorten
from bs4 import BeautifulSoup
import json
import requests
import re
 
url = 'https://m.propertyfinder.ae/en/rent/apartment-for-rent-dubai-dubai-marina-marina-promenade-delphine-tower-7276805.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
all_scripts = soup.find_all('script')
whatsapp_script = all_scripts[6]
## Changed it here
print(type(whatsapp_script))
Output:
<class 'bs4.element.Tag'>
You are trying to use regex on a bs4.element.Tag
You'll need to convert it to a string.
Try:

## I cast the whatsapp_script to a string for the search
whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', str(whatsapp_script))
I think that will work for you.
Output:
{"type":"whatsapp","value":"+971566809258","link":"whatsapp:\/\/send?phone=+971566809258&text=Hello,%0aI would like to get more information about this property you posted on propertyfinder.ae:%0a%0aReference: AP31055%0aType: Apartment%0aPrice: 145000 AED Yearly%0aLocation: Delphine Tower%0a%0aLink: https:\/\/m.propertyfinder.ae\/en\/rent\/apartment-for-rent-dubai-dubai-marina-marina-promenade-delphine-tower-7276805.html"}
As you move forward in python see what you can learn from your traceback
Error:
File "/Users/evilslab/Documents/Websites/www.futurepoint.dev.cc/dobuyme/python/fetchFinder.py", line 12, in <module> whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script) TypeError: expected string or bytes-like object
These are the important lines from this traceback
The first tells you the line number
The second tells you the expression that didn't work
The third tells you why it didn't work

Based on these it was clear to me that you were sending the wrong data type in. I did a quick check to see what type that you were sending in, and was able to find and correct the error with a simple casting.
Reply
#3
Yes it got fixed for me. I used .group()
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Unable to count the number of tries in guessing game. Frankduc 7 1,901 Mar-20-2022, 08:16 PM
Last Post: menator01
  Please support regex for version number (digits and dots) from a string Tecuma 4 3,173 Aug-17-2020, 09:59 AM
Last Post: Tecuma
  [regex] Good way to parse variable number of items? Winfried 4 2,609 May-15-2020, 01:54 PM
Last Post: Winfried
  [Regex] Findall returns wrong number of hits Winfried 8 5,793 Aug-23-2018, 02:21 PM
Last Post: Winfried
  Regex: How to say 'any number of characters of any type until x'? JoeB 2 2,365 Jan-24-2018, 03:30 PM
Last Post: Mekire

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020