Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Contractors
#1
I asked a project manager if the contractors were checking a particular text box for illegal characters, or doing a more robust search for html code. This is the response she forwarded from the contractor:


Quote:I am not comparing individual characters against a list of allowed characters. I am taking the whole input as a string and compared with .NET regular expression to produce a list of what illegal characters was typed into the input textbox. The code is not using any logic that iterate through individual characters


Sorry. I just had to share this with someone before my head exploded.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#2
(Jan-31-2017, 05:26 PM)ichabod801 Wrote: I asked a project manager if the contractors were checking a particular text box for illegal characters, or doing a more robust search for html code. This is the response she forwarded from the contractor:


Quote:I am not comparing individual characters against a list of allowed characters. I am taking the whole input as a string and compared with .NET regular expression to produce a list of what illegal characters was typed into the input textbox. The code is not using any logic that iterate through individual characters


Sorry. I just had to share this with someone before my head exploded.

The ".NET regular expression" is strange but doing something like
illegal=re.sub('[a-zA-Z0-9:.;]','',checkedstring)
to test for non-allowed characters sort of makes sense (especially if you want to output a message telling
which characters are in error...)
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply
#3
Yes, but that regular expression would be doing what he says he is not doing.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#4
Maybe he means that the logic for iterating through all the characters is encapsulated by the regex?
Reply
#5
It turned out to be as bad as I expected. They're just searching for individual characters, including !, $, ;, and :. This is in a web form that is one of the sources for the database I supervise, for a field expecting English text, possibly several paragraphs of it. I checked the full list "illegal" characters against all of the online submissions from 2016, and 35.8% were rejected. (None of them should have been rejected).
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#6
if it is text for feedback from users/customers, then rejecting the text because it has ! or $ in it is just plain stupid.  are these the same people that can't accept a cc number because it looks like '1234-5678-9012-3456' or an ssn when it is typed in like '123-45-6789' or a phone number formatted like '987-654-3210'?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#7
Pretty much. We have recalls, each with a recall number. They are in the format 16-018 (the 18th recall of 2016). That format, with the dash, is how they appear in all of our public documents. But for years the system wouldn't recognize recall numbers with a dash. It had to be 16018.

We have numeric product codes in the range 100-9999. When they (actually another set of contractors) built the database, they stored them as strings. That meant that when you searched for '801' in the interface, it also pulled up '1801', '2801', '3801', and so on. I'm having problems with exported data from the interface. When you export an Excel spreadsheet, all the dates are exported as strings. When you sort, '01/01/2017' comes before '12/31/1995'. You have to export it as csv, and then load the csv into Excel, in order to get it to recognize the dates as dates.

It just goes on and on and on. We're on our fourth set of contractors now. The second set got fired because they connected the test environment to the public web site, which posted a bunch of reports containing text like 'ka;jd;flakjd;afjkf'.
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#8
hey ...... that was my password.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#9
(Feb-05-2017, 11:30 AM)ichabod801 Wrote: We have numeric product codes in the range 100-9999. When they (actually another set of contractors) built the database, they stored them as strings.

Technically, they are. Adding two of these doesn't yield an interesting result. They are just strings restricted to contain only digits, like postal codes and phone numbers. This doesn't mean they can't have proper code to retrieve them...
Unless noted otherwise, code in my posts should be understood as "coding suggestions", and its use may require more neurones than the two necessary for Ctrl-C/Ctrl-V.
Your one-stop place for all your GIMP needs: gimp-forum.net
Reply
#10
Actually, the government of the US has as far back as I can remember used text fields (full ASCII files), often of various lengths for numeric data.
Pure binary files are used as well, but are less often seen in public data, unless specifically targeted for developers.

A search for 'data types used in files' on the catalog.data.gov returns

130,028 datasets found for "data types used in files"

I found that it varies from group to group.
Here's a sample of a part of a typical  public record layout (I have it in a dictionary for my application)

            'VotingDistricts': {
                'GEOID': {
                    'length': '11',
                    'type': 'String',
                    'description': 'Voting district identifier; a concatenation of the '
                                   'state FIPS code, county FIPS code, and voting district'
                                   'code'
                },
                'VTDI': {
                    'length': '1',
                    'type': 'String',
                    'description': '2010 Census voting district indicator'
                },
                'NAMELSAD10': {
                    'length': '100',
                    'type': 'String',
                    'description': '2010 Census name and the translated legal/statistical '
                                   'area description for voting district'
                },
                'LSAD10': {
                    'length': '2',
                    'type': 'String',
                    'description': '2010 Census legal/statistical area description code for '
                                   'voting district'
                },
                'ALAND10': {
                    'length': '14',
                    'type': 'Number',
                    'description': '2010 Census land area'
                },
                'AWATER10': {
                    'length': '14',
                    'type': 'Number',
                    'description': '2010 Census water area'
                },
                'INTPTLAT10': {
                    'length': '11',
                    'type': 'String',
                    'description': '2010 Census latitude of the internal point'
                },
                'INTPTLON10': {
                    'length': '12',
                    'type': 'String',
                    'description': '2010 Census longitude of the internal point'
                }
            },
Note the two Number data types.
This is not unlike storing a numeric field in a database varchar or varchar2 field

Oracle's definition is:

Quote:The VARCHAR2 datatype stores variable-length character strings. When you create a table with a VARCHAR2 column, you specify a maximum column length (in bytes, not characters) between 1 and 2000 for the VARCHAR2 column. For each row, Oracle stores each value in the column as a variable-length field (unless a value exceeds the column's maximum length and Oracle returns an error). For example, assume you declare a column VARCHAR2 with a maximum size of 50 characters. In a single-byte character set, if only 10 characters are given for the VARCHAR2 column value in a particular row, the column in the row's row piece only stores the 10 characters (10 bytes), not 50.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020