Python Forum

Full Version: getting file type information
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
there are a number of different functions and methods in a variety of different modules to return a status of whether a named file is of a particular type or not. is there a function that just returns what type the file is?
(Aug-21-2019, 03:39 AM)Skaperen Wrote: [ -> ]there are a number of different functions and methods in a variety of different modules to return a status of whether a named file is of a particular type or not. is there a function that just returns what type the file is?

I would like to know how to get the type of a file
There are some 3-party libaries that do this eg python-magic, filetype, fleep.
python-magic has best support of file-types as it eg can detected .py which the other i mention can not.
Test:
tom@tom:~/Documents/py_files$ ptpython
>>> import magic                                                                

>>> magic.from_file('test.html')                                                                                       
'HTML document, ASCII text'

>>> magic.from_file('toss.py')                                                                                         
'Python script, ASCII text executable'

>>> magic.from_file('geckodriver-v0.24.0-linux64.tar.gz')                                                                                                                               
'gzip compressed data, last modified: Mon Jan 28 22:49:19 2019, from Unix'

# Rename file over so no info in file name
>>> magic.from_file('unknown')                                                                                                                                                          
'gzip compressed data, last modified: Mon Jan 28 22:49:19 2019, from Unix'
i only need file system semantic types, e.g. regular file, directory, symbolic link, character device, block device, named socket, pipe. but this looks interesting. it can't understand that uncompressed it is a tar file? i wonder what other compression formats it understands.
to get the mime type in Linux you can do

import subprocess
import shlex

path = "/path/to/myfile.tar.gz"
cmd = shlex.split('file --mime-type {0}'.format(path))
result = subprocess.check_output(cmd)
mime_type = result.split()[-1]
print (mime_type)
to what extent can shlex provide the separate values for HTTP headers Content-Type: and Content-Encoding:? it would need to recognize the major compression formats and actually do enough decompression to recognize what's inside.