-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incomplete MIME type support #369
Comments
To clarify, this is not a 0.4.0->0.4.1 regression, correct? |
Correct, it's not a 0.4.1 regression. This seems to be an old issue, hence the |
I've run some checks on the full dataset that we have amassed in the freedomofpress/dangerzone-test-set repo. I specifically wanted to find which are the MIME types that we don't handle, and which file extensions they are associated with. This test set does not include every possible MIME type of course, and it probably contains some corrupted files as well, but it's a good starting point:
|
Indeed, it's looking like mimetypes don't cut it. I took a look at this file file as an example and in fact, the mime type is reporting:
Could we potentially use the file extension to second-guess the file type when it's one of these types? |
I guess this is exactly what mediawiki (Wikipedia's software) does: They use it in this context: |
@apyrgio hinted at the fact that we may not need to specify the export filter in But as we can see in the following lines, it's actually not needed as it guesses the filter as we can see by first running with an explicit filter
And then without providing one:
You'll also notice that the extension doesn't matter. I purposely renamed a dangerzone/container/dangerzone.py Line 176 in 5a0c4d0
Filter GuessingIf it doesn't need a filter explicitly passed to it, it must guess it. Taking a look at the File Conversion Filter Names we can find all the available filters. Some are export related while others are not. Two of them allow macro execution, which is a common attack vector. In Dangerzone, we assume the possibility of having an attacker controller RCE (see the section "It’s still possible to get hacked with Dangerzone" in https://dangerzone.rocks/about.html). But we won't give it for free. So we must ensure that only PDF export filters are chosen. |
Taking a bit of a look under the hood, I have an assumption that the code that triggers the conversion starts in this 'if'. Then:
Now, I'm just not sure what the export flag The only other option that might restrict this is to see if having the PDF as the output format (which our command does already) somehow limit which of these filters it picks up, limiting them only to the ones that export to |
I will leave this investigation for later |
I actually think we'll be ok for three reasons:
Also, we can check what Dangerzone will do against the |
I tested this out in Alpine Linux and it does work. Here's a way to check what happens in the container:
LibreOffice opens for me, and it greets me saying that macros are disabled. The and the security level is indeed set to "High". |
Remove the association between MIME types and export filters, because LibreOffice is able to auto-detect them on its own. Instead, ask LibreOffice to simply convert the document to a .pdf. This association was cumbersome for yet another reason; there are MIME types that may be associated with more than one file type. That's why it's better to let LibreOffice decide the proper filter for the conversion. Our current understanding is that this change won't widen our attack surface for the following reasons: * The output filters for PDF documents are pretty specific, and we don't affect the input filters somehow. * The default behavior of LibreOffice on Alpine Linux is to disable macros. * We preemptively run LibreOffice in safe mode, to remove hardware acceleration and make sure that macros are not invoked as well. Closes #369
Regarding macros, one important consideration is that the LibreOffice docs write the following about security level "High":
LibreOffice docs do not further explain how LibreOffice validates these signatures. For instance, are self-signed macros accepted? Moreover, we see that LibreOffice has been affected by signature-related CVEs in the past: https://www.libreoffice.org/about-us/security/advisories/cve-2022-26305 Thankfully, the CVE description sheds some light to this:
The key takeaway here is that LibreOffice will check signed macros against stored certificates, so self-signed macros are not allowed. Given that our LibreOffice installation in the container does not have trusted certificates/sources, the security level "High" effectively becomes equal to "Very High". |
Remove the association between MIME types and export filters, because LibreOffice is able to auto-detect them on its own. Instead, ask LibreOffice to simply convert the document to a .pdf. This association was cumbersome for yet another reason; there are MIME types that may be associated with more than one file type. That's why it's better to let LibreOffice decide the proper filter for the conversion. Our current understanding is that this change won't widen our attack surface for the following reasons: * The output filters for PDF documents are pretty specific, and we don't affect the input filters somehow. * The default behavior of LibreOffice on Alpine Linux is to disable macros. Closes #369
Closed by a1c87a2 in |
When Dangerzone first encounters a file, it needs to detect its MIME type, so that it can choose the proper converter. The list of supported mime types (and the associated converters) is the following:
dangerzone/container/dangerzone.py
Lines 138 to 203 in a33dcfb
Using Dangerzone on a large set of files, we discovered that there are two MIME types that very common but are not supported:
For instance, this file currently fails on Dangerzone: https://github.com/freedomofpress/dangerzone-test-set/blob/4cbf14ac31ac986ced60e83867aac8a6d2d4a81b/all_documents/HTMLImage.odt. For an association between MIME types and file extensions, you can see the following, taken from a list of 200 documents:
From this list, it's evident that
application/octet-stream
can refer to many file types.application/zip
refers just to.odt
, but we can't be definitely sure about that. Ideally then, if a file does not have a known MIME type and instead uses one of those two, we should also check the file extension.The text was updated successfully, but these errors were encountered: