Python Forum
str.find() not returning correct index.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
str.find() not returning correct index.
#6
(Aug-18-2020, 04:44 AM)bowlofred Wrote: Hmm. I'm not sure what's going on then. I wonder if the file is in some odd encoding that your editor is handling automatically?
When I opened up the file in HexFeind I made sure I had the encoding set to ASII. The only difference is that the ASCII is "strict 7 bit" rather than just "ascii". When the file is converted there's obviously going to be characters that shouldn't be there like sort of unsupported character. The editor will remove these or replace them with blank but python probably doesn't and that seems like the most logical explanation.

(Aug-18-2020, 04:44 AM)bowlofred Wrote: If you're looking for the "aHR..." string, then your python program (with a couple tiny updates) works for me. I've saved your upload in a text file.

Output:
$ cat ascii.txt -----ASCII ENCODED DATA------- =aHR0cDovL3d3dy5ib29tbGluZ3MuY29tL2RhdGFiYXNlL2dldEdKTGV2ZWxzMjEucGhw#lvl_dataaHR0cDovL3d3dy5ib29tbGluZ3MuY29tL2RhdGFiYXNlL2dldFNhdmVEYXRhLnBocA==&page=%i&secret=%saHR0cDovL3d3dy5ib29tbGluZ3MuY29tL2RhdGFiYXNlL2dldEdKTWFwUGFja3MyMS5waHA=pack_%igauntlet_%iget_gauntlets&secret=%saHR0cDovL3d3dy5ib29tbGluZ3MuY29tL2RhdGFiYXNlL2dldEdKR2F1bnRsZXRzMjEucGhw&gauntlet=%i_%i&levelID=%i&inc=%i&extras=%i&secret=%s&rs=%i%i%s%i%s%i%s&chk=aHR0cDovL3d3dy5ib29tbGluZ3MuY29tL2RhdGFiYXNlL2Rvd25sb2FkR0pMZXZlbDIyLnBocA==%i,%i,%i,%i,%i,%i,%i,%i&levelID=%i&gameVersion=%i&secret=%sgeometry.ach.rateDiff&levelID=%i&stars=%i&secret=%ssg6pUrt0J58281aHR0cDovL3d3dy5ib29tbGluZ3MuY29tL2RhdGFiYXNlL3JhdGVHSlN0YXJzMjExLnBocA==1128989
And then this program:

b64link = "aHR0cDovL3d3dy5ib29tbGluZ3MuY29tL2RhdGFiYXNlL2Rvd25sb2FkR0pMZXZlbDIyLnBocA=="
with open("ascii.txt", 'r', encoding="ascii") as f:
    t = f.read()
    if(b64link in t):
        location = t.find(b64link)
        print(location)
Generates this output:
Output:
456
That's zero-indexed. If I open the text file in vi and goto 457, that puts me right on the the string.
The only difference between yours and mine is that I have the errors="ignore" otherwise it will throw an error like:
Error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 0: ordinal not in range(128)
when decoding. This wouldn't happen with the provided text because that is in a much bigger block of text that has no characters that would cause a decoding error.

The last thing I tried was:
def getLocation(self):
    location = None

    link = self.link
    b64link = base64.b64encode(link.encode()).decode()

    b64link = self.stringXEscape(b64link.decode("ascii")).encode()
    link = self.stringXEscape(link).encode()

    with open(self.dir, 'rb') as f:
        t = f.read()
        if(b64link in t):
            location = t.index(b64link)
        elif(link in t):
            location = t.index(link)
        else:
            pass

    return location #returns integer location of the string specified in the specified file

def stringXEscape(self, string):
    return "{}".format(''.join(['\\x{:02x}'.format(ord(str(c))) for c in string]))
by comparing the raw python hex string but it doesn't work. It doesn't seem to find the string, so it never makes it past the 'if'. I've also looked for the string myself and I can't find it in the text either. I don't even know the line that it would be at so I can see what is the difference between my string and the one I am looking for.

I'll keep thinking for other ways of finding the string. It might even need to be something stupid like compressing everything to like gzip and just working with every string it its gzip compressed for, until the very end where it can be decompressed.

EDIT: I haven't posted it yet because I had one last idea to see the characters that come before the string that python thinks is correct, to see if any were like "broken" characters. Couldn't find anything. I then thought about trying to open the file with strict 7 bit ascii encoding but I couldn't find how to do that. I ended trying out latin-1 encoding at it actually works (even without ignoring errors). I get 5521001 which I believe is the correct index of the string.
Latin-1 is apparently sort of extended ascii and when I was looking for those "broken" character I actually had to use an extended ascii table.
It still doesn't really make sense why it wasn't finding the correct index originally though.
The index returned is still slightly off but only by a few tens of characters. It seems like they're null terminated so that could be why.
Reply


Messages In This Thread
RE: str.find() not returning correct index. - by DreamingInsanity - Aug-18-2020, 08:39 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  labels.append(self.classes.index(member.find('name').text)) hobbyist 1 1,964 Dec-15-2021, 01:53 PM
Last Post: deanhystad
  pandas pivot table: How to find count for each group in Index and Column JaneTan 0 3,379 Oct-23-2021, 04:35 AM
Last Post: JaneTan
  Find index value in List Martin2998 3 2,841 May-12-2020, 02:17 PM
Last Post: deanhystad
  How to find something in a list using its index rix 1 1,769 Dec-20-2019, 04:12 PM
Last Post: stullis
  Find index of missing number parthi1705 3 3,184 May-07-2019, 10:52 AM
Last Post: avorane
  Function not returning correct value ActualNoob 3 2,763 Jan-11-2019, 12:35 AM
Last Post: stullis
  How Do I find Index of a character in string? ilcaa72 5 3,822 May-23-2018, 11:44 PM
Last Post: wavic
  find the index of "Annual" in spell_list nikhilkumar 1 5,682 Jul-12-2017, 04:56 PM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020