Python Forum
converting to int - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: converting to int (/thread-37796.html)

Pages: 1 2


converting to int - Skaperen - Jul-23-2022

i have a string (s) that may have other stuff append to it, that i want to convert to an int with int(s,0). but i don't know how many characters of it are convertible. there are other characters appended. do i need to just keep trying shorter substrings until it works?

like this:
def intplus(s):
    while s:
        try:
            return int(s,0)
        except ValueError:
            s=s[:-1]
    return 0
the above code is a "thinking draft" and so, is untested. it's just what i was thinking about, as i conjured up this post


RE: converting to int - ndc85430 - Jul-23-2022

Can't you use a regular expression to extract the numeric part?

>>> s = "32abcd"
>>> re.match(r"(\d+).*", s).groups()
('32',)
>>> s = "123foobar"
>>> re.match(r"(\d+).*", s).groups()
('123',)



RE: converting to int - rob101 - Jul-23-2022

If you're trying to extract digits (any digit) that's in a string object, then this code may be of help.

I'm currently re-reading a book that I've had for years (The C Programming Language: K&R) and converting the code examples into Python3

I'll post the code as is; it's annotated with notes regarding the C code from which the Python code has been developed.

I hope it's of use to you.

#!/usr/bin/python3
# Page 20/21

print(20*'\n') # clear 20 lines of the screen

# int nwhite, nother;
nwhite = nother = 0

#--------------------------------------#
# int ndigit[10];                      #
# for (i = 0; i < 10; ++i)             #
#    ndigit[i] = 0;                    #
ndigit = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
#--------------------------------------#

# while ((c = getchar()) !=EOF)
c = input(':~$ ')

for i in range(len(c)): # for loop to simulate the buffer by getting one character at a time, held in c[i]
    # if (c >= '0' && c <= '9')
    #    ++ndigit[c-'0'];
    if c[i] >= '0' and c[i] <= '9':
        ndigit[ord(c[i])-ord('0')]+= 1
    elif c[i] == ' ' or c[i] == '\n' or c[i] == '\t':
        nwhite += 1
    else:
        nother += 1

# output
print('digits =',end='')
for i in range(len(ndigit)):
    print(' '+str(ndigit[i])+' x '+str(i)+' |',end='')
print('\nwhite space =',nwhite,'\tother =',nother,'\n')
print('program exit\n')



RE: converting to int - Skaperen - Jul-23-2022

(Jul-23-2022, 07:30 AM)ndc85430 Wrote: Can't you use a regular expression to extract the numeric part?
perhaps. but how well does that regular expression handle "0xffgghh"? will it come up with 0?


RE: converting to int - Skaperen - Jul-23-2022

(Jul-23-2022, 01:56 PM)rob101 Wrote: The C Programming Language: K&R
i loaned mine to a friend after i read it about 20 times. he never gave it back. but that was 3 decades ago and i forgot which friend it was because i had loaned it to 4 other friends, first. but i didn't forget C.

maybe people wonder how i can have so many friends. it's called being sysadmin of a few university unix machines.


RE: converting to int - rob101 - Jul-23-2022

Ah, I now see (from your post above) what you're trying to do -- sorry for introducing noise into your thread.


RE: converting to int - BashBedlam - Jul-23-2022

Here is an example of one way to do that in Python3.
digit_counter = {'1': 0, '2': 0, '3': 0, '4': 0,
	'5': 0, '6': 0, '7': 0, '8': 0, '9': 0}
white_space_counter = {' ': 0, '\n':0, '\t':0}

user_input = input (':~$ ')

for element in user_input :
	if element in digit_counter :
		digit_counter [element] += 1
	if element in white_space_counter :
		white_space_counter [element] += 1

print (digit_counter)
print (white_space_counter) 



RE: converting to int - ibreeden - Jul-24-2022

Yeah, great book, "The C Programming Language" from Kernighan and Ritchie.
But about your problem, @Skaperen , I still believe regular expressions are most efficient for solving this. Let me first summarise if I understood you well:
  • You have lines that may start with an integer.
  • This integer may be in octal, hexadecimal or decimal format.
  • You need to capture this integer.

These rules can be translated to a regular expression. You need to know "re" has an "OR" operator. It is: "|". The regular expression must search for:
  1. "0o" followed by digits 0-7, or
  2. "0x" followed by digits 0-F, or
  3. digits 0-9.
This translated to a regular expression:
r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)"
I tested it with this little program.
import re

# For efficiency reasons: compile this only once.
initial_integer = re.compile(r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)")
""" Meaning:
    Capture the start of a string containing:
    - "0o" followed by digits 0-7
    - "0x" followed by digits 0-f
    - digits 0-9
"""

testdata = ["abcdefg",
            "12789efgh",
            "0o12789efgh",
            "0x12789efgh",
            ""]
for mystr in testdata:
    print(f"Teststring: {mystr}\t", end="")
    print(re.match(initial_integer, mystr).groups()[0])
Output:
Teststring: abcdefg Teststring: 12789efgh 12789 Teststring: 0o12789efgh 0o127 Teststring: 0x12789efgh 0x12789ef Teststring:
I hope you can use this solution.


RE: converting to int - Skaperen - Jul-24-2022

(Jul-23-2022, 07:52 PM)BashBedlam Wrote: Here is an example of one way to do that in Python3.
digit_counter = {'1': 0, '2': 0, '3': 0, '4': 0,
	'5': 0, '6': 0, '7': 0, '8': 0, '9': 0}
white_space_counter = {' ': 0, '\n':0, '\t':0}

user_input = input (':~$ ')

for element in user_input :
	if element in digit_counter :
		digit_counter [element] += 1
	if element in white_space_counter :
		white_space_counter [element] += 1

print (digit_counter)
print (white_space_counter) 

just a quick look at it doesn't show me how it would get the value 345 from the string "345foo99boo123" or the value 65533 from the string "0xfffdummyfff". of course, my eyes could be wrong. i didn't try to run it.


RE: converting to int - Skaperen - Jul-24-2022

(Jul-24-2022, 08:51 AM)ibreeden Wrote: Yeah, great book, "The C Programming Language" from Kernighan and Ritchie.
But about your problem, @Skaperen , I still believe regular expressions are most efficient for solving this. Let me first summarise if I understood you well:
  • You have lines that may start with an integer.
  • This integer may be in octal, hexadecimal or decimal format.
  • You need to capture this integer.

These rules can be translated to a regular expression. You need to know "re" has an "OR" operator. It is: "|". The regular expression must search for:
  1. "0o" followed by digits 0-7, or
  2. "0x" followed by digits 0-F, or
  3. digits 0-9.
This translated to a regular expression:
r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)"
I tested it with this little program.
import re

# For efficiency reasons: compile this only once.
initial_integer = re.compile(r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)")
""" Meaning:
    Capture the start of a string containing:
    - "0o" followed by digits 0-7
    - "0x" followed by digits 0-f
    - digits 0-9
"""

testdata = ["abcdefg",
            "12789efgh",
            "0o12789efgh",
            "0x12789efgh",
            ""]
for mystr in testdata:
    print(f"Teststring: {mystr}\t", end="")
    print(re.match(initial_integer, mystr).groups()[0])
Output:
Teststring: abcdefg Teststring: 12789efgh 12789 Teststring: 0o12789efgh 0o127 Teststring: 0x12789efgh 0x12789ef Teststring:
I hope you can use this solution.

i would need to add support for "0b" for binary, to that, but i think i see how to do it. you expanded my understanding of regular expressions by enough for this.