Python Forum
test if a file really has Python code
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
test if a file really has Python code
#1
i have found that the "file" command in Ubuntu Linux is doing a poor job at recognizing Python files. i ran it through all files in "/usr" that end in ".py" (12661 files) and many of them (1391) were not recognized by the "file" command. in a few cases, the file contained all comments, so interpreting it as various other languages would produce no errors. in most cases there were some various Python statements and valid Python code. in all cases i'd say all the experienced coders here could quickly recognize it as Python. maybe, some of the smaller ones could be error free in some other languages.

back when i was in college (mainframe batch card days) someone claimed they could write a program that would read source code and figure out if it was Fortran or PL/1 (and later added COBOL to his list) and run the proper compiler. i then wrote a program that worked under either the Fortran compiler or the PL/1 compiler (and did the same thing). he gave up on his project.

all i need right now is the test "is this file valid as Python". if it also looks like something else, i don't care. i only want to know if it is valid as Python. is there some way to do this without running the code? i'm not concerned about erroneous code. that can be answered either way and does not matter.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
There seems to be a Python implementation of what you want:
https://pypi.org/project/guesslang/ (download)
https://guesslang.readthedocs.io/en/latest/ (documentation)

See the following links for possible existing solutions written in other languages:
https://github.com/github/linguist
https://github.com/blackducksoftware/ohcount (Linux Only)
https://github.com/chrislo/sourceclassifier/tree/master

There are also several commercial applications that exist.

One would think a Python script could be written to identify other Python source code. In your specific environment the code wouldn't have to weed out all languages that would generate 'false positives' and 'false negatives', only the constructions that you would encounter.

There are several constructions that seem unique to the Python language:
a. Shebang (line 1) that includes the text 'python'
b. Line starting with 'if' and ending with a ':'
c. Line starting with 'elif' and ending with ':'
d. Line starting with 'for' and ending with a ':'
e. Line with only the text 'else:'

Items that would never occur in the Python language:
a. '#include' or similar used by other languages
b. Line starting with comment symbol unique to other languages
c. Keywords that NEVER occur in Python.

Reference: https://en.wikipedia.org/wiki/Comparison...s_(syntax)

Lewis
To paraphrase: 'Throw out your dead' code. https://www.youtube.com/watch?v=grbSQ6O6kbs Forward to 1:00
Reply
#3
so i wonder what guesslang would guess about:

# initialize:
    a = 0
    b = 1
if it were just by itself (and not in this post saying Python Code:)

(Apr-28-2018, 04:14 PM)ljmetzger Wrote: Items that would never occur in the Python language:
a. '#include' or similar used by other languages
b. Line starting with comment symbol unique to other languages
c. Keywords that NEVER occur in Python.

Reference: https://en.wikipedia.org/wiki/Comparison...s_(syntax)

Lewis

a. having '#include', while very common in C, does not rule out Python since it is a valid comment.

b. what if the continuation of a line in the midst of an arithmetic expression began with '//'? it could rule out Python2.

c. they could be valid variable names in Python.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Free server environment to test python capabilities on networking etc Emekadavid 3 2,377 Jun-05-2020, 12:21 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020