You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While investigating this issue, I found the python-magic library which correctly identifies the XLSX file type. I've tested it with the same file and it returns the proper mime type. However, this solution requires installing libmagic as a system dependency. Would it be acceptable to add this dependency to the project?
@MatheusAbdias Thanks for your suggestion. This comes back to #802, where we track this issue more broadly.
We want to avoid working with libmagic since it is under GPL-license, hence we cannot distribute it. We try to avoid system library dependencies that we cannot bundle.
Bug
The
_guess_format
method in_DocumentConversionInput
class is incorrectly identifying XLSX files as "application/zip" format.Steps to Reproduce
The _guess_format method in _DocumentConversionInput is returning "application/zip"
Inside the
_guess_format
filetype.guess_mime
is returningapplication/zip
....
Docling version
ocling version: 2.24.0
Docling Core version: 2.20.0
Docling IBM Models version: 3.4.0
Docling Parse version: 3.4.0
Python: cpython-312 (3.12.8)
Platform: Linux-6.6.75-2-MANJARO-x86_64-with-glibc2.41
...
Python version
Python 3.12.8
...
The text was updated successfully, but these errors were encountered: