converting to int

Skaperen · Jul-23-2022, 06:45 AM

i have a string (s) that may have other stuff append to it, that i want to convert to an int with int(s,0). but i don't know how many characters of it are convertible. there are other characters appended. do i need to just keep trying shorter substrings until it works?

like this:

def intplus(s):
    while s:
        try:
            return int(s,0)
        except ValueError:
            s=s[:-1]
    return 0

the above code is a "thinking draft" and so, is untested. it's just what i was thinking about, as i conjured up this post

ndc85430 · Jul-23-2022, 07:30 AM

Can't you use a regular expression to extract the numeric part?

>>> s = "32abcd"
>>> re.match(r"(\d+).*", s).groups()
('32',)
>>> s = "123foobar"
>>> re.match(r"(\d+).*", s).groups()
('123',)

rob101 · (This post was last modified: Jul-23-2022, 01:56 PM by rob101.)

If you're trying to extract digits (any digit) that's in a string object, then this code may be of help.

I'm currently re-reading a book that I've had for years (The C Programming Language: K&R) and converting the code examples into Python3

I'll post the code as is; it's annotated with notes regarding the C code from which the Python code has been developed.

I hope it's of use to you.

#!/usr/bin/python3
# Page 20/21

print(20*'\n') # clear 20 lines of the screen

# int nwhite, nother;
nwhite = nother = 0

#--------------------------------------#
# int ndigit[10];                      #
# for (i = 0; i < 10; ++i)             #
#    ndigit[i] = 0;                    #
ndigit = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
#--------------------------------------#

# while ((c = getchar()) !=EOF)
c = input(':~$ ')

for i in range(len(c)): # for loop to simulate the buffer by getting one character at a time, held in c[i]
    # if (c >= '0' && c <= '9')
    #    ++ndigit[c-'0'];
    if c[i] >= '0' and c[i] <= '9':
        ndigit[ord(c[i])-ord('0')]+= 1
    elif c[i] == ' ' or c[i] == '\n' or c[i] == '\t':
        nwhite += 1
    else:
        nother += 1

# output
print('digits =',end='')
for i in range(len(ndigit)):
    print(' '+str(ndigit[i])+' x '+str(i)+' |',end='')
print('\nwhite space =',nwhite,'\tother =',nother,'\n')
print('program exit\n')

Skaperen · Jul-23-2022, 05:44 PM

(Jul-23-2022, 07:30 AM)ndc85430 Wrote: Can't you use a regular expression to extract the numeric part?

perhaps. but how well does that regular expression handle "0xffgghh"? will it come up with 0?

Skaperen · (This post was last modified: Jul-23-2022, 05:54 PM by Skaperen.)

(Jul-23-2022, 01:56 PM)rob101 Wrote: The C Programming Language: K&R

i loaned mine to a friend after i read it about 20 times. he never gave it back. but that was 3 decades ago and i forgot which friend it was because i had loaned it to 4 other friends, first. but i didn't forget C.

maybe people wonder how i can have so many friends. it's called being sysadmin of a few university unix machines.

rob101 · Jul-23-2022, 06:09 PM

Ah, I now see (from your post above) what you're trying to do -- sorry for introducing noise into your thread.

BashBedlam · Jul-23-2022, 07:52 PM

Here is an example of one way to do that in Python3.

digit_counter = {'1': 0, '2': 0, '3': 0, '4': 0,
	'5': 0, '6': 0, '7': 0, '8': 0, '9': 0}
white_space_counter = {' ': 0, '\n':0, '\t':0}

user_input = input (':~$ ')

for element in user_input :
	if element in digit_counter :
		digit_counter [element] += 1
	if element in white_space_counter :
		white_space_counter [element] += 1

print (digit_counter)
print (white_space_counter)

ibreeden · Jul-24-2022, 08:51 AM

Yeah, great book, "The C Programming Language" from Kernighan and Ritchie.
But about your problem, @Skaperen , I still believe regular expressions are most efficient for solving this. Let me first summarise if I understood you well:

You have lines that may start with an integer.
This integer may be in octal, hexadecimal or decimal format.
You need to capture this integer.

These rules can be translated to a regular expression. You need to know "re" has an "OR" operator. It is: "|". The regular expression must search for:

"0o" followed by digits 0-7, or
"0x" followed by digits 0-F, or
digits 0-9.

This translated to a regular expression:

r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)"

I tested it with this little program.

import re

# For efficiency reasons: compile this only once.
initial_integer = re.compile(r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)")
""" Meaning:
    Capture the start of a string containing:
    - "0o" followed by digits 0-7
    - "0x" followed by digits 0-f
    - digits 0-9
"""

testdata = ["abcdefg",
            "12789efgh",
            "0o12789efgh",
            "0x12789efgh",
            ""]
for mystr in testdata:
    print(f"Teststring: {mystr}\t", end="")
    print(re.match(initial_integer, mystr).groups()[0])

Output:Teststring: abcdefg	
Teststring: 12789efgh	12789
Teststring: 0o12789efgh	0o127
Teststring: 0x12789efgh	0x12789ef
Teststring:

I hope you can use this solution.

Skaperen · Jul-24-2022, 10:41 PM

(Jul-23-2022, 07:52 PM)BashBedlam Wrote: Here is an example of one way to do that in Python3.

digit_counter = {'1': 0, '2': 0, '3': 0, '4': 0,
	'5': 0, '6': 0, '7': 0, '8': 0, '9': 0}
white_space_counter = {' ': 0, '\n':0, '\t':0}

user_input = input (':~$ ')

for element in user_input :
	if element in digit_counter :
		digit_counter [element] += 1
	if element in white_space_counter :
		white_space_counter [element] += 1

print (digit_counter)
print (white_space_counter)

just a quick look at it doesn't show me how it would get the value 345 from the string "345foo99boo123" or the value 65533 from the string "0xfffdummyfff". of course, my eyes could be wrong. i didn't try to run it.

Skaperen · Jul-24-2022, 10:48 PM

(Jul-24-2022, 08:51 AM)ibreeden Wrote: Yeah, great book, "The C Programming Language" from Kernighan and Ritchie.
But about your problem, @Skaperen , I still believe regular expressions are most efficient for solving this. Let me first summarise if I understood you well:
You have lines that may start with an integer.

This integer may be in octal, hexadecimal or decimal format.

You need to capture this integer.

These rules can be translated to a regular expression. You need to know "re" has an "OR" operator. It is: "|". The regular expression must search for:

"0o" followed by digits 0-7, or

"0x" followed by digits 0-F, or

digits 0-9.

This translated to a regular expression:
r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)"
I tested it with this little program.
import re

# For efficiency reasons: compile this only once.
initial_integer = re.compile(r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)")
""" Meaning:
    Capture the start of a string containing:
    - "0o" followed by digits 0-7
    - "0x" followed by digits 0-f
    - digits 0-9
"""

testdata = ["abcdefg",
            "12789efgh",
            "0o12789efgh",
            "0x12789efgh",
            ""]
for mystr in testdata:
    print(f"Teststring: {mystr}\t", end="")
    print(re.match(initial_integer, mystr).groups()[0])
Output:Teststring: abcdefg	
Teststring: 12789efgh	12789
Teststring: 0o12789efgh	0o127
Teststring: 0x12789efgh	0x12789ef
Teststring: 	
I hope you can use this solution.

i would need to add support for "0b" for binary, to that, but i think i see how to do it. you expanded my understanding of regular expressions by enough for this.

converting to int

User Panel Messages

Announcements