Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
converting to int
#1
i have a string (s) that may have other stuff append to it, that i want to convert to an int with int(s,0). but i don't know how many characters of it are convertible. there are other characters appended. do i need to just keep trying shorter substrings until it works?

like this:
def intplus(s):
    while s:
        try:
            return int(s,0)
        except ValueError:
            s=s[:-1]
    return 0
the above code is a "thinking draft" and so, is untested. it's just what i was thinking about, as i conjured up this post
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Can't you use a regular expression to extract the numeric part?

>>> s = "32abcd"
>>> re.match(r"(\d+).*", s).groups()
('32',)
>>> s = "123foobar"
>>> re.match(r"(\d+).*", s).groups()
('123',)
Reply
#3
If you're trying to extract digits (any digit) that's in a string object, then this code may be of help.

I'm currently re-reading a book that I've had for years (The C Programming Language: K&R) and converting the code examples into Python3

I'll post the code as is; it's annotated with notes regarding the C code from which the Python code has been developed.

I hope it's of use to you.

#!/usr/bin/python3
# Page 20/21

print(20*'\n') # clear 20 lines of the screen

# int nwhite, nother;
nwhite = nother = 0

#--------------------------------------#
# int ndigit[10];                      #
# for (i = 0; i < 10; ++i)             #
#    ndigit[i] = 0;                    #
ndigit = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
#--------------------------------------#

# while ((c = getchar()) !=EOF)
c = input(':~$ ')

for i in range(len(c)): # for loop to simulate the buffer by getting one character at a time, held in c[i]
    # if (c >= '0' && c <= '9')
    #    ++ndigit[c-'0'];
    if c[i] >= '0' and c[i] <= '9':
        ndigit[ord(c[i])-ord('0')]+= 1
    elif c[i] == ' ' or c[i] == '\n' or c[i] == '\t':
        nwhite += 1
    else:
        nother += 1

# output
print('digits =',end='')
for i in range(len(ndigit)):
    print(' '+str(ndigit[i])+' x '+str(i)+' |',end='')
print('\nwhite space =',nwhite,'\tother =',nother,'\n')
print('program exit\n')
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#4
(Jul-23-2022, 07:30 AM)ndc85430 Wrote: Can't you use a regular expression to extract the numeric part?
perhaps. but how well does that regular expression handle "0xffgghh"? will it come up with 0?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
(Jul-23-2022, 01:56 PM)rob101 Wrote: The C Programming Language: K&R
i loaned mine to a friend after i read it about 20 times. he never gave it back. but that was 3 decades ago and i forgot which friend it was because i had loaned it to 4 other friends, first. but i didn't forget C.

maybe people wonder how i can have so many friends. it's called being sysadmin of a few university unix machines.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
Ah, I now see (from your post above) what you're trying to do -- sorry for introducing noise into your thread.
Sig:
>>> import this

The UNIX philosophy: "Do one thing, and do it well."

"The danger of computers becoming like humans is not as great as the danger of humans becoming like computers." :~ Konrad Zuse

"Everything should be made as simple as possible, but not simpler." :~ Albert Einstein
Reply
#7
Here is an example of one way to do that in Python3.
digit_counter = {'1': 0, '2': 0, '3': 0, '4': 0,
	'5': 0, '6': 0, '7': 0, '8': 0, '9': 0}
white_space_counter = {' ': 0, '\n':0, '\t':0}

user_input = input (':~$ ')

for element in user_input :
	if element in digit_counter :
		digit_counter [element] += 1
	if element in white_space_counter :
		white_space_counter [element] += 1

print (digit_counter)
print (white_space_counter) 
Reply
#8
Yeah, great book, "The C Programming Language" from Kernighan and Ritchie.
But about your problem, @Skaperen , I still believe regular expressions are most efficient for solving this. Let me first summarise if I understood you well:
  • You have lines that may start with an integer.
  • This integer may be in octal, hexadecimal or decimal format.
  • You need to capture this integer.

These rules can be translated to a regular expression. You need to know "re" has an "OR" operator. It is: "|". The regular expression must search for:
  1. "0o" followed by digits 0-7, or
  2. "0x" followed by digits 0-F, or
  3. digits 0-9.
This translated to a regular expression:
r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)"
I tested it with this little program.
import re

# For efficiency reasons: compile this only once.
initial_integer = re.compile(r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)")
""" Meaning:
    Capture the start of a string containing:
    - "0o" followed by digits 0-7
    - "0x" followed by digits 0-f
    - digits 0-9
"""

testdata = ["abcdefg",
            "12789efgh",
            "0o12789efgh",
            "0x12789efgh",
            ""]
for mystr in testdata:
    print(f"Teststring: {mystr}\t", end="")
    print(re.match(initial_integer, mystr).groups()[0])
Output:
Teststring: abcdefg Teststring: 12789efgh 12789 Teststring: 0o12789efgh 0o127 Teststring: 0x12789efgh 0x12789ef Teststring:
I hope you can use this solution.
Reply
#9
(Jul-23-2022, 07:52 PM)BashBedlam Wrote: Here is an example of one way to do that in Python3.
digit_counter = {'1': 0, '2': 0, '3': 0, '4': 0,
	'5': 0, '6': 0, '7': 0, '8': 0, '9': 0}
white_space_counter = {' ': 0, '\n':0, '\t':0}

user_input = input (':~$ ')

for element in user_input :
	if element in digit_counter :
		digit_counter [element] += 1
	if element in white_space_counter :
		white_space_counter [element] += 1

print (digit_counter)
print (white_space_counter) 

just a quick look at it doesn't show me how it would get the value 345 from the string "345foo99boo123" or the value 65533 from the string "0xfffdummyfff". of course, my eyes could be wrong. i didn't try to run it.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#10
(Jul-24-2022, 08:51 AM)ibreeden Wrote: Yeah, great book, "The C Programming Language" from Kernighan and Ritchie.
But about your problem, @Skaperen , I still believe regular expressions are most efficient for solving this. Let me first summarise if I understood you well:
  • You have lines that may start with an integer.
  • This integer may be in octal, hexadecimal or decimal format.
  • You need to capture this integer.

These rules can be translated to a regular expression. You need to know "re" has an "OR" operator. It is: "|". The regular expression must search for:
  1. "0o" followed by digits 0-7, or
  2. "0x" followed by digits 0-F, or
  3. digits 0-9.
This translated to a regular expression:
r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)"
I tested it with this little program.
import re

# For efficiency reasons: compile this only once.
initial_integer = re.compile(r"^(0[oO][0-7]*|0[xX][0-9a-fA-F]*|[0-9]*)")
""" Meaning:
    Capture the start of a string containing:
    - "0o" followed by digits 0-7
    - "0x" followed by digits 0-f
    - digits 0-9
"""

testdata = ["abcdefg",
            "12789efgh",
            "0o12789efgh",
            "0x12789efgh",
            ""]
for mystr in testdata:
    print(f"Teststring: {mystr}\t", end="")
    print(re.match(initial_integer, mystr).groups()[0])
Output:
Teststring: abcdefg Teststring: 12789efgh 12789 Teststring: 0o12789efgh 0o127 Teststring: 0x12789efgh 0x12789ef Teststring:
I hope you can use this solution.

i would need to add support for "0b" for binary, to that, but i think i see how to do it. you expanded my understanding of regular expressions by enough for this.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020