Python Forum
Regex not finding all unicode characters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex not finding all unicode characters
#1
I'm trying to parse a flight tracker web page. But when I try to get the values from the 'Course' tab, I'm only getting the ←,→ ,↓ ,↑. Its not reading the others, ↗,↙

import urllib3
from bs4 import BeautifulSoup
import regex

url = f'https://flightaware.com/live/flight/ETH3626/history/20210705/1400Z/HAAB/VHHH/tracklog'
req = urllib3.PoolManager()
res = req.request('GET', url)
soup = BeautifulSoup(res.data, 'lxml')
contents = soup.find_all('td', attrs={'align':'right'})

for content in contents:
    content = str(content)

    kts = regex.search(r'<td align="right">\d+</td>',content)
    kts = content.replace('<td align="right">', '').replace('</td>', '')

    course = regex.search(r'<td align="right"><span>[\u2190-\u2199]\s\d+\W</span></td>', content)
    #course = content.replace('<span>>">', '').replace('</span>', '')

    '''course regex only find up,down,left,right not unicodes 2196-2198'''

    print(course)
Reply
#2
Any suggestions? I' m not sure my regex is not finding all of the course
course = regex.search(r'<td align="right"><span>[\u2190-\u2199]\s\d+\W</span></td>', content)
Reply
#3
Don't need regex can do it like this,and unescapes the HTML to get Unicode.
import requests
from bs4 import BeautifulSoup
from time import sleep
import html

url = 'https://flightaware.com/live/flight/ETH3626/history/20210705/1400Z/HAAB/VHHH/tracklog'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
sleep(2)
table = soup.select_one('#tracklogTable')
row = table.select_one('tr:nth-child(5) > td:nth-child(4)')
print(html.unescape(row.text))
Output:
↙ 226°
This will not work in all Editor or shell,usually don't' want Unicode value as it has no meaning other than display in Browser.
Can read parse whole table with Pandas and open in Jupyter Notebook to get the same display.
Reply
#4
Thank you. I'll look into Pandas.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Remove escape characters / Unicode characters from string DreamingInsanity 5 13,422 May-15-2020, 01:37 PM
Last Post: snippsat
  Regex: finding if three groups have a value in them Daring_T 7 3,280 May-15-2020, 12:27 AM
Last Post: Daring_T
  clean unicode string to contain only characters from some unicode blocks gmarcon 2 3,918 Nov-23-2018, 09:17 PM
Last Post: Gribouillis
  Python regex with negative set of characters multiline sonicblind 2 3,357 Jul-30-2018, 08:43 PM
Last Post: sonicblind
  tf.gfile.FastGFile error unicode ( japanese characters ) majinbuu 2 3,049 May-13-2018, 02:11 PM
Last Post: majinbuu
  code wanted: finding absent characters Skaperen 2 2,898 Mar-26-2018, 03:12 AM
Last Post: Skaperen
  Regex: How to say 'any number of characters of any type until x'? JoeB 2 2,350 Jan-24-2018, 03:30 PM
Last Post: Mekire
  Need to replace (remove) Unicode characters in text ineuw 1 8,531 Jan-02-2018, 08:01 PM
Last Post: micseydel
  Finding Special Characters in a String ATXpython 4 19,387 Sep-30-2016, 10:08 PM
Last Post: ATXpython

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020