Python Forum

Full Version: Unable to pull number using BeautifulSoup and Regex
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I'm new to python, I'm trying to pull the number from the document using BeautifulSoup.

Here is my code :

from textwrap import shorten
from bs4 import BeautifulSoup
import json
import requests
import re

url = 'https://m.propertyfinder.ae/en/rent/apartment-for-rent-dubai-dubai-marina-marina-promenade-delphine-tower-7276805.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
all_scripts = soup.find_all('script')
whatsapp_script = all_scripts[6]
whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script)

print(whatsapp.group())
I'm getting error like :

Error:
Traceback (most recent call last): File "/Users/evilslab/Documents/Websites/www.futurepoint.dev.cc/dobuyme/python/fetchFinder.py", line 12, in <module> whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/re.py", line 199, in search return _compile(pattern, flags).search(string) TypeError: expected string or bytes-like object
I'm trying to get the whatsapp number. How i can do that ?

Anyone here to help ?
from textwrap import shorten
from bs4 import BeautifulSoup
import json
import requests
import re
 
url = 'https://m.propertyfinder.ae/en/rent/apartment-for-rent-dubai-dubai-marina-marina-promenade-delphine-tower-7276805.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
all_scripts = soup.find_all('script')
whatsapp_script = all_scripts[6]
## Changed it here
print(type(whatsapp_script))
Output:
<class 'bs4.element.Tag'>
You are trying to use regex on a bs4.element.Tag
You'll need to convert it to a string.
Try:

## I cast the whatsapp_script to a string for the search
whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', str(whatsapp_script))
I think that will work for you.
Output:
{"type":"whatsapp","value":"+971566809258","link":"whatsapp:\/\/send?phone=+971566809258&text=Hello,%0aI would like to get more information about this property you posted on propertyfinder.ae:%0a%0aReference: AP31055%0aType: Apartment%0aPrice: 145000 AED Yearly%0aLocation: Delphine Tower%0a%0aLink: https:\/\/m.propertyfinder.ae\/en\/rent\/apartment-for-rent-dubai-dubai-marina-marina-promenade-delphine-tower-7276805.html"}
As you move forward in python see what you can learn from your traceback
Error:
File "/Users/evilslab/Documents/Websites/www.futurepoint.dev.cc/dobuyme/python/fetchFinder.py", line 12, in <module> whatsapp = re.search('{"type":"whatsapp","value":"([^"]+)"[^}]+}', whatsapp_script) TypeError: expected string or bytes-like object
These are the important lines from this traceback
The first tells you the line number
The second tells you the expression that didn't work
The third tells you why it didn't work

Based on these it was clear to me that you were sending the wrong data type in. I did a quick check to see what type that you were sending in, and was able to find and correct the error with a simple casting.
Yes it got fixed for me. I used .group()