pdfminer package: can't find exgtract_text function - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: pdfminer package: can't find exgtract_text function (/thread-32164.html) |
pdfminer package: can't find exgtract_text function - Pavel_47 - Jan-25-2021 Hello, Using pdfminer package I faced the following problem: >>> from pdfminer import high_level >>> extracted_text = high_level.extract_text(full_filename_inp, "", [4]) Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> extracted_text = high_level.extract_text(full_filename_inp, "", [4]) AttributeError: module 'pdfminer.high_level' has no attribute 'extract_text'But, according to documentation the function extract_text does exist in pdfminer package. pdfminer package Any suggestions ? Thanks RE: pdfminer package: can't find exgtract_text function - Larz60+ - Jan-25-2021 The document that you point to is pdfminer-six. Since 2020, the original pdfminer is dormant, and pdfminer.six is the fork which Euske recommends if you need an actively maintained version of pdfminer. Which do you have installed? install for pdfminer-six is pip install pdfminer.six
RE: pdfminer package: can't find exgtract_text function - Pavel_47 - Jan-25-2021 First I installed pdfminer: Then I saw this issue and installed pdfminer.six: So I don't know what's really going on ... which one is imported.
RE: pdfminer package: can't find exgtract_text function - buran - Jan-25-2021 uninstall both and install just pdfminer.six RE: pdfminer package: can't find exgtract_text function - Pavel_47 - Jan-25-2021 Concerning Error message:
RE: pdfminer package: can't find exgtract_text function - Pavel_47 - Jan-25-2021 After pdfminer.six reinstall, the initial example works. Thanks. RE: pdfminer package: can't find exgtract_text function - snippsat - Jan-25-2021 Do not change anything,try if works as it probably dos now.It's highly unlike that one version number of Beautifulsoup will break anything in this package, as BS it's not even in required packed for pdfminer.six. Here a quick tutorial on using virtual environment ,it's build into Python an just take a minute to do.This solve all dependency conflicts as none what you installed before is been looked at or used,it's now all new. tom@tom-VirtualBox:~$ python -V Python 3.9.1 # Make tom@tom-VirtualBox:~$ python -m venv pdf_env # Cd in tom@tom-VirtualBox:~$ cd pdf_env/ # Activate tom@tom-VirtualBox:~/pdf_env$ source bin/activate # Install (pdf_env) tom@tom-VirtualBox:~/pdf_env$ pip install pdfminer.six Collecting pdfminer.six ..... Successfully installed cffi-1.14.4 chardet-4.0.0 cryptography-3.3.1 pdfminer.six-20201018 pycparser-2.20 six-1.15.0 sortedcontainers-2.3.0Test that it work. (pdf_env) tom@tom-VirtualBox:~/pdf_env$ python Python 3.9.1 (default, Jan 25 2021, 15:34:59) [GCC 7.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from pdfminer import high_level >>> >>> high_level.extract_text <function extract_text at 0x7fe2273cc310> >>> help(high_level.extract_text) .....When do pip list only packages in this environment is shown as it's isolated from what's install on OS level.(pdf_env) tom@tom-VirtualBox:~/pdf_env$ pip list Package Version ---------------- -------- cffi 1.14.4 chardet 4.0.0 cryptography 3.3.1 pdfminer.six 20201018 pip 21.0 pycparser 2.20 setuptools 49.2.1 six 1.15.0 sortedcontainers 2.3.0 RE: pdfminer package: can't find exgtract_text function - Pavel_47 - Jan-25-2021 Ok, now it works. Thanks. |