Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete MIME type support #369

Closed
apyrgio opened this issue Mar 14, 2023 · 12 comments · Fixed by #378
Closed

Incomplete MIME type support #369

apyrgio opened this issue Mar 14, 2023 · 12 comments · Fixed by #378
Labels
bug Something isn't working container stretch goal
Milestone

Comments

@apyrgio
Copy link
Contributor

apyrgio commented Mar 14, 2023

When Dangerzone first encounters a file, it needs to detect its MIME type, so that it can choose the proper converter. The list of supported mime types (and the associated converters) is the following:

# .pdf
"application/pdf": {"type": None},
# .docx
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": {
"type": "libreoffice",
"libreoffice_output_filter": "writer_pdf_Export",
},
# .doc
"application/msword": {
"type": "libreoffice",
"libreoffice_output_filter": "writer_pdf_Export",
},
# .docm
"application/vnd.ms-word.document.macroEnabled.12": {
"type": "libreoffice",
"libreoffice_output_filter": "writer_pdf_Export",
},
# .xlsx
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": {
"type": "libreoffice",
"libreoffice_output_filter": "calc_pdf_Export",
},
# .xls
"application/vnd.ms-excel": {
"type": "libreoffice",
"libreoffice_output_filter": "calc_pdf_Export",
},
# .pptx
"application/vnd.openxmlformats-officedocument.presentationml.presentation": {
"type": "libreoffice",
"libreoffice_output_filter": "impress_pdf_Export",
},
# .ppt
"application/vnd.ms-powerpoint": {
"type": "libreoffice",
"libreoffice_output_filter": "impress_pdf_Export",
},
# .odt
"application/vnd.oasis.opendocument.text": {
"type": "libreoffice",
"libreoffice_output_filter": "writer_pdf_Export",
},
# .odg
"application/vnd.oasis.opendocument.graphics": {
"type": "libreoffice",
"libreoffice_output_filter": "impress_pdf_Export",
},
# .odp
"application/vnd.oasis.opendocument.presentation": {
"type": "libreoffice",
"libreoffice_output_filter": "impress_pdf_Export",
},
# .ops
"application/vnd.oasis.opendocument.spreadsheet": {
"type": "libreoffice",
"libreoffice_output_filter": "calc_pdf_Export",
},
# .jpg
"image/jpeg": {"type": "convert"},
# .gif
"image/gif": {"type": "convert"},
# .png
"image/png": {"type": "convert"},
# .tif
"image/tiff": {"type": "convert"},
"image/x-tiff": {"type": "convert"},

Using Dangerzone on a large set of files, we discovered that there are two MIME types that very common but are not supported:

application/zip
application/octet-stream

For instance, this file currently fails on Dangerzone: https://github.com/freedomofpress/dangerzone-test-set/blob/4cbf14ac31ac986ced60e83867aac8a6d2d4a81b/all_documents/HTMLImage.odt. For an association between MIME types and file extensions, you can see the following, taken from a list of 200 documents:

02_doc_macros_signed_by_attacker_manipulated.odt: application/zip                                                      
02_doc_signed_by_attacker_manipulated2.odt: application/zip                                                            
02_doc_signed_by_attacker_manipulated.odt: application/zip 
02_doc_signed_by_attacker_manipulated_triple.odt: application/zip                                                      
02_doc_signed_by_trusted_person_manipulated.odt: application/zip                                                       
1_page.docx: application/octet-stream                      
82fff64a-0a21-4b09-bbdc-2914a5a150f0.odt: application/zip                                                              
BackgroundImageTest.odt: application/zip                                                                               
CUSTOM.odt: application/zip                                                                                                                                                                                                                    
CVE-2003-0820-1.doc: application/octet-stream                                                                          
CVE-2005-0941-1.doc: application/octet-stream                                                                          
CVE-2006-2389-1.doc: application/octet-stream                                                                          
CVE-2006-3059-1.xls: application/octet-stream                                                                          
CVE-2006-3086-1.xls: application/octet-stream                                                                          
CVE-2006-3493-1.doc: application/octet-stream                                                                          
CVE-2006-3655-1.ppt: application/octet-stream                                                                          
CVE-2006-3656-1.ppt: application/octet-stream                                                                          
CVE-2006-3660-1.ppt: application/octet-stream                                                                          
CVE-2006-5296-1.ppt: application/octet-stream                                                                          
CVE-2006-6561-1.doc: application/octet-stream                                                                                                                                                                                                  
CVE-2006-6628-1.doc: application/octet-stream                                                                          
CVE-2007-0031-1.xls: application/octet-stream                                                                          
CVE-2007-1347-1.doc: application/octet-stream                                                                          
CVE-2007-3490-1.xls: application/octet-stream                                                                          
CVE-2008-2752-1.doc: application/octet-stream                                                                                                                                                                                                  
CVE-2008-2752-2.doc: application/octet-stream                                                                          
CVE-2008-2752-3.doc: application/octet-stream                                                                          
CVE-2008-2752-4.doc: application/octet-stream                                                                          
CVE-2008-4841-1.doc: application/octet-stream                                                                          
CVE-2009-0200-1.doc: application/octet-stream                                                                          
CVE-2009-0201-1.doc: application/octet-stream              
CVE-2009-0259-1.doc: application/octet-stream                                                                          
CVE-2009-3129-1.xls: application/octet-stream                                                                          
CVE-2009-3301-1.doc: application/octet-stream              
CVE-2009-3302-1.doc: application/octet-stream              
CVE-2009-3302-2.doc: application/octet-stream              
CVE-2010-0033-1.ppt: application/octet-stream              
CVE-2010-1245-1.xls: application/octet-stream              
CVE-2010-1246-1.xls: application/octet-stream                                                                          
CVE-2010-1248-1.xls: application/octet-stream                                                                          
CVE-2010-3200-1.doc: application/octet-stream                                                                          
CVE-2011-0105-1.xls: application/octet-stream              
CVE-2011-0978-1.xls: application/octet-stream                                                                          
CVE-2012-4233-1.odt: application/octet-stream                                                                          
CVE-2012-4233-2.odg: application/octet-stream              
CVE-2014-6356-1.doc: application/octet-stream              
CVE-2014-6361.xls: application/octet-stream                
EDB-18952-1.doc: application/octet-stream                                                                                                                                                                                                      
HTMLImage.odt: application/zip  

From this list, it's evident that application/octet-stream can refer to many file types. application/zip refers just to .odt, but we can't be definitely sure about that. Ideally then, if a file does not have a known MIME type and instead uses one of those two, we should also check the file extension.

@apyrgio apyrgio added bug Something isn't working container stretch goal labels Mar 14, 2023
@apyrgio apyrgio added this to the 0.4.1 milestone Mar 14, 2023
@eloquence
Copy link
Member

To clarify, this is not a 0.4.0->0.4.1 regression, correct?

@apyrgio apyrgio changed the title Missing MIME type support Incomplete MIME type support Mar 15, 2023
@apyrgio
Copy link
Contributor Author

apyrgio commented Mar 15, 2023

Correct, it's not a 0.4.1 regression. This seems to be an old issue, hence the stretch goal label.

@apyrgio
Copy link
Contributor Author

apyrgio commented Mar 22, 2023

I've run some checks on the full dataset that we have amassed in the freedomofpress/dangerzone-test-set repo. I specifically wanted to find which are the MIME types that we don't handle, and which file extensions they are associated with. This test set does not include every possible MIME type of course, and it probably contains some corrupted files as well, but it's a good starting point:

application/zip (184 files)

This MIME type seems to be used in file types that bundle several content types within them. It is associated in our dataset with .odt, .docx, .odg, .odp, .ods, and .pptx files.

application/octet-stream (184 files)

This MIME type is used to specify that the file contains binary data. From there on, it's the responsibility of the appplication to interpret them. In our data set, we see this MIME type for the following extensions: .doc, .docx, .odg, .odp, .odt, .pdf, .ppt, .pptx, .xls, and .xlsx

application/x-ole-storage (44 files)

I'm not sure what this MIME type is associated with, but I see lots of issues that have to do with incorrect association of Microsoft Word documents with this MIME type (e.g., mimemagicrb/mimemagic#50). In our dataset, this MIME type is indeed associated with .doc, .ppt, and .xls files.

application/encrypted (4 files)

This MIME type is present in 4 .docx files. It seems to be an undocumented MIME type, but I guess it has to do with document encryption. Since Dangerzone does not handle encrypted documents yet, we can expect to fail for this MIME type.

application/vnd.oasis.opendocument.spreadsheet-template (1 file)

This MIME type is used for .ods templates (.ots), but the file in our dataset happens to have the .ods extension.

application/vnd.oasis.opendocument.text-template (1 file)

This MIME type is used for .odt templates (.ott), but the file in our dataset happens to have the .odt extension.

application/vnd.sun.xml.calc (1 file)

This MIME type is used for Calc 6.0 spreadsheets. (.sxc), but the file in our dataset happens to have the .ods extension.

@deeplow
Copy link
Contributor

deeplow commented Mar 23, 2023

Indeed, it's looking like mimetypes don't cut it. I took a look at this file file as an example and in fact, the mime type is reporting:

>>> magic.Magic(mime=True).from_file("timeFormFormats.odt")
'application/zip'

Could we potentially use the file extension to second-guess the file type when it's one of these types?

@deeplow
Copy link
Contributor

deeplow commented Mar 24, 2023

Could we potentially use the file extension to second-guess the file type when it's one of these types?

I guess this is exactly what mediawiki (Wikipedia's software) does:

https://github.com/wikimedia/mediawiki/blob/005d20e470741a8020430275425c9dfd1009fe1b/includes/libs/mime/MimeAnalyzer.php#L796-L834

They use it in this context:

https://github.com/wikimedia/mediawiki-extensions-PdfHandler/blob/7579f4f3adf069d84693c9a26414c16496ba4985/includes/PdfHandler.php#L300-L301

@deeplow
Copy link
Contributor

deeplow commented Mar 27, 2023

@apyrgio hinted at the fact that we may not need to specify the export filter in container/dangerzone.py. Apparently we've been explicitly stating the export filter (based on mime type) since the initial commit.

But as we can see in the following lines, it's actually not needed as it guesses the filter as we can see by first running with an explicit filter writer_pdf_Export:

$ soffice --headless --convert-to pdf:writer_pdf_Export --outdir /tmp sample-odt.pptx
convert /home/user/dangerzone/tests/test_docs/sample-odt.pptx -> /tmp/sample-odt.pdf using filter : writer_pdf_Export

And then without providing one:

$ soffice --headless --convert-to pdf --outdir /tmp sample-odt.pptx
convert /home/user/dangerzone/tests/test_docs/sample-odt.pptx -> /tmp/sample-odt.pdf using filter : writer_pdf_Export

You'll also notice that the extension doesn't matter. I purposely renamed a .odt file to .pptx in the above example and it played no role in the filter selection.

"application/vnd.oasis.opendocument.text": {

Filter Guessing

If it doesn't need a filter explicitly passed to it, it must guess it. Taking a look at the File Conversion Filter Names we can find all the available filters. Some are export related while others are not. Two of them allow macro execution, which is a common attack vector.

In Dangerzone, we assume the possibility of having an attacker controller RCE (see the section "It’s still possible to get hacked with Dangerzone" in https://dangerzone.rocks/about.html). But we won't give it for free. So we must ensure that only PDF export filters are chosen.

@deeplow
Copy link
Contributor

deeplow commented Mar 27, 2023

Taking a bit of a look under the hood, I have an assumption that the code that triggers the conversion starts in this 'if'.

Then:

Now, I'm just not sure what the export flag SfxFilterFlags::EXPORT means in practice and the code is a bit complex. If I were to guess, I'd say it comes from the filters listed here. The filters listed there are the ones mentioned in the documentation and in their XML some flags are listed. One of them is the EXPORT. However, this is not good news since almost all of them have the EXPORT flag, including the VBA ones.

The only other option that might restrict this is to see if having the PDF as the output format (which our command does already) somehow limit which of these filters it picks up, limiting them only to the ones that export to application/pdf.

@deeplow
Copy link
Contributor

deeplow commented Mar 27, 2023

The only other option that might restrict this is to see if having the PDF as the output format (which our command does already) somehow limit which of these filters it picks up, limiting them only to the ones that export to application/pdf.

I will leave this investigation for later

@apyrgio
Copy link
Contributor Author

apyrgio commented Mar 27, 2023

I actually think we'll be ok for three reasons:

  1. The default behavior of LibreOffice seems to be (at least in my system, we can verify that I Alpine Linux as well) that it opens documents with Macro Security Level High (see here for more info).
  2. We can start libreoffice with the --safe-mode flag, which I think (again, in my system) opens documents with Macro Security Level High.
  3. From the list of filters you provided, I don't see any filter for application/pdf that refers to macros. So, we won't be any worse than we were before.

Also, we can check what Dangerzone will do against the vba_macro_functions.ods doc from our large dataset. If the macro does not work and we see only $VALUE, then we can be safe.

@apyrgio
Copy link
Contributor Author

apyrgio commented Mar 27, 2023

I tested this out in Alpine Linux and it does work. Here's a way to check what happens in the container:

$ podman run --rm -it -v /etc/localtime:/etc/localtime:ro --security-opt seccomp=unconfined --privileged --userns keep-id \
    -v ./vba_macro_functions.ods:/tmp/input_file -e DISPLAY=:0 -v /tmp/.X11-unix:/tmp/.X11-unix:ro \
    -e XAUTHORITY=<xauthority_path> -v <xauthority_path> \
     dangerzone.rocks/dangerzone libreoffice /tmp/input_file

LibreOffice opens for me, and it greets me saying that macros are disabled. The .ods file renders as follows:

image

and the security level is indeed set to "High".

apyrgio added a commit that referenced this issue Mar 27, 2023
Remove the association between MIME types and export filters, because
LibreOffice is able to auto-detect them on its own. Instead, ask
LibreOffice to simply convert the document to a .pdf.

This association was cumbersome for yet another reason; there are MIME
types that may be associated with more than one file type. That's why
it's better to let LibreOffice decide the proper filter for the
conversion.

Our current understanding is that this change won't widen our attack
surface for the following reasons:

* The output filters for PDF documents are pretty specific, and we don't
  affect the input filters somehow.
* The default behavior of LibreOffice on Alpine Linux is to disable
  macros.
* We preemptively run LibreOffice in safe mode, to remove hardware
  acceleration and make sure that macros are not invoked as well.

Closes #369
@apyrgio
Copy link
Contributor Author

apyrgio commented Mar 28, 2023

Regarding macros, one important consideration is that the LibreOffice docs write the following about security level "High":

Only macros from trusted sources and signed macros (from any source) are allowed to run.

LibreOffice docs do not further explain how LibreOffice validates these signatures. For instance, are self-signed macros accepted? Moreover, we see that LibreOffice has been affected by signature-related CVEs in the past: https://www.libreoffice.org/about-us/security/advisories/cve-2022-26305

Thankfully, the CVE description sheds some light to this:

To determine whether a macro is signed by a trusted author, LibreOffice matches the used certificate with the list of trusted certificates stored in the user's configuration database.
[...]
This vulnerability is not exploitable if macro security level is set to very high or if the user has no trusted certificates.

The key takeaway here is that LibreOffice will check signed macros against stored certificates, so self-signed macros are not allowed. Given that our LibreOffice installation in the container does not have trusted certificates/sources, the security level "High" effectively becomes equal to "Very High".

apyrgio added a commit that referenced this issue Mar 28, 2023
Remove the association between MIME types and export filters, because
LibreOffice is able to auto-detect them on its own. Instead, ask
LibreOffice to simply convert the document to a .pdf.

This association was cumbersome for yet another reason; there are MIME
types that may be associated with more than one file type. That's why
it's better to let LibreOffice decide the proper filter for the
conversion.

Our current understanding is that this change won't widen our attack
surface for the following reasons:

* The output filters for PDF documents are pretty specific, and we don't
  affect the input filters somehow.
* The default behavior of LibreOffice on Alpine Linux is to disable
  macros.

Closes #369
@apyrgio
Copy link
Contributor Author

apyrgio commented Apr 3, 2023

Closed by a1c87a2 in release-0.4.1 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working container stretch goal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants