Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extracting numbers
#1
Hello, I am wondering if anyone can help me with my computing code, I have a text file and need to extract the UK phone numbers, and ensure that the phone numbers have a prefix of +44 and are followed by ten digits.
So far I've got:

import re # Import the regex module
uk_numbers = # The list where we will store the UK phone numbers
pattern =(?)
with open (‘phone_log.txt”, “rt”) as in_file:
for linenum, line in enumerate(in_file):
if pattern.search(line) !=None:
err_occur.append((linenum, line.rstrip(‘\n’)))
for linenum, line in err_occur
print(“Line”, linenum, “:”, line, sep=’’)
except FileNotFoundErrror:
print(“Input file not found”)

I'm currently unsure of what would be in my pattern line?
I was thinking maybe
pattern = re.compile('\"tel\:[\(\)\-0-10\ ]{1,}\"')
but I'm not sure if this would ensure that the phone numbers have a prefix of +44 and are followed by ten digits.
Any help would be much appreciated thanks
Ronnie
Reply
#2
Maybe this will help:

#!/usr/bin/python3
import re

pattern = re.compile(r'''
    tel:           # tel:
    \s*?           # maybe some spaces
    \+44           # +44
    \s*?           # maybe some spaces
    \d{10,10}      # 10 digits
    ''', re.X)
uk_numbers = []

with open ('tel.txt') as in_file:
    for linenum, line in enumerate(in_file):
        if pattern.search(line) != None:
            uk_numbers.append((linenum, line.rstrip('\n')))

for linenum, line in uk_numbers:
    print("Line", linenum, ":", line)
tel.txt:
   "tel:+44 1234567890"
tel:    +44         1234567890
  tel:             +441234567890
 tel:+441234567890
Reply
#3
(Oct-26-2017, 04:14 PM)heiner55 Wrote: Maybe this will help:

#!/usr/bin/python3
import re

pattern = re.compile(r'''
    tel:           # tel:
    \s*?           # maybe some spaces
    \+44           # +44
    \s*?           # maybe some spaces
    \d{10,10}      # 10 digits
    ''', re.X)
uk_numbers = []

with open ('tel.txt') as in_file:
    for linenum, line in enumerate(in_file):
        if pattern.search(line) != None:
            uk_numbers.append((linenum, line.rstrip('\n')))

for linenum, line in uk_numbers:
    print("Line", linenum, ":", line)
tel.txt:
   "tel:+44 1234567890"
tel:    +44         1234567890
  tel:             +441234567890
 tel:+441234567890

I've tried doing this however I'm having issues for when running it?
I've changes the "tel.txt" to the name of "phonecalls.txt", which is the name of the file I am extracting the data from, is this okay to do?
Many Thanks
Ronnie
Reply
#4
That is ok.
Reply
#5
(Oct-27-2017, 01:53 PM)heiner55 Wrote: That is ok.

Ok that's great, so my code is the following:

>>> data_file = open("phone_log.txt", "r")
>>> data = data_file.readlines()
>>> import re
>>> pattern = re.compile(r'''tel:\s*?\+44\s*?\d{10,10}''', re.X)
>>> uk_numbers =[]
>>> with open( 'phone_log.txt') as in_file:
for linenum, line in enumerate(in_file):
if pattern.search(line) != None:
uk_numbers.append((linenum, line.rstrip('\n')))


>>> for linenum, line in uk_numbers:
print("Line", linenum, ":", line)

What must I do to print the UK phone numbers? Would be great if I could know this since I've been trying for the past hour and not succeeding
Very much appreciated for your help
Reply
#6
Maybe this helps: https://docs.python.org/3.6/library/re.html

#!/usr/bin/python3
import re
 
pattern = r"""
    tel:       # tel:
    \s*?       # maybe some spaces
    \+44       # +44
    \s*?       # maybe some spaces
   (\d{10,10}) # 10 digits
"""
 
with open ('phone_log.txt') as in_file:
    for linenr, line in enumerate(in_file):
        match = re.search(pattern, line, re.X)
        if match:
            print("Line %d: %s" % (linenr, match.group(1)))
phone_log.txt:
here is some text   "tel:+44 1234567890" hers is some text
tel:    +44         1234567890   text text
text tel text  tel:             +441234567890  text
tel:+441234567890


Output:
Line 0: 1234567890 Line 1: 1234567890 Line 2: 1234567890 Line 3: 1234567890
Reply
#7
(Oct-27-2017, 02:28 PM)heiner55 Wrote: Maybe this helps: https://docs.python.org/3.6/library/re.html

#!/usr/bin/python3
import re
 
pattern = r'''
    tel:       # tel:
    \s*?       # maybe some spaces
    \+44       # +44
    \s*?       # maybe some spaces
   (\d{10,10}) # 10 digits
'''
 
with open ('phone_log.txt') as in_file:
    for linenr, line in enumerate(in_file):
        match = re.search(pattern, line, re.X)
        if match:
            print("Line ", linenr, ": ", match[1], sep='')
phone_log.txt:
here is some text   "tel:+44 1234567890" hers is some text
tel:    +44         1234567890   text text
text tel text  tel:             +441234567890  text
tel:+441234567890


Output:
Line 0: 1234567890 Line 1: 1234567890 Line 2: 1234567890 Line 3: 1234567890

Ahh,yes
as in, how do i get that output? what must i do to get that output ?
Reply
#8
print("Line ", linenr, ": ", match[1], sep='')
Reply
#9
(Oct-27-2017, 03:35 PM)heiner55 Wrote:
print("Line ", linenr, ": ", match[1], sep='')

Do I type this in the Python Shell?
Reply
#10
You get the output if you run the sample program (see above) with the input-file phone_log.txt.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Print Numbers starting at 1 vertically with separator for output numbers Pleiades 3 3,669 May-09-2019, 12:19 PM
Last Post: Pleiades

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020