Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qubes: Error handling #430

Closed
10 tasks done
Tracked by #412
deeplow opened this issue May 29, 2023 · 3 comments · Fixed by #546 or #568
Closed
10 tasks done
Tracked by #412

Qubes: Error handling #430

deeplow opened this issue May 29, 2023 · 3 comments · Fixed by #546 or #568
Assignees
Labels
P:Qubes QubesOS integration
Milestone

Comments

@deeplow
Copy link
Contributor

deeplow commented May 29, 2023

We need to accommodate exceptions for all edge-cases in the client binary-protocol-parsing code. This was out of scope of the alpha stage (#411).

Errors to check on server:

  • document wasn't immediately received:
    probably the user exited dangerzone mid-conversion. Safely exit from dangerzone so that there isn't a dangling disposable qube
  • The number of received pages was (> MAX_PAGES)
    • note: this one is probably going to have to be done server-side. The (Qubes) client retrieves 2 bytes for pages. "Overflows" won't be detected.

Errors to check on the client:

  • The disposable qube could not start due to memory issues.
  • The disposable qube could not start because dz-dvm does not exist. (**update: done in Generalize "out of RAM" error to reflect other issues #564)
  • The width and the height of the pages are too large.
  • Page conversion timeout reached
  • Timed out receiving a page
  • Consume stderr (source)
  • Show helpful message if OCR step fails (e.g., OCR failed)
  • Stop disposable VMs when aborting conversions update: will be tackled separately since it also affects containers. Moved to Shut down dangling containers / VMs #563

Errors to check on the client:

Also, we need to sanitize tracebacks and errors from the disposable qube, in a way that does not affect the user's terminal (e.g., remove control characters). update: this will be done in #386

@deeplow deeplow added the P:Qubes QubesOS integration label May 29, 2023
@deeplow deeplow changed the title Qubes Integ: Error handling and hardening Qubes Integ: Error handling, timeouts and hardening May 29, 2023
@deeplow deeplow mentioned this issue May 29, 2023
7 tasks
@apyrgio apyrgio added this to the 0.5.0 milestone Jun 13, 2023
@apyrgio apyrgio changed the title Qubes Integ: Error handling, timeouts and hardening Qubes: Error handling Jun 14, 2023
@deeplow
Copy link
Contributor Author

deeplow commented Aug 28, 2023

I moved the "number of pages received" to be a server-side check. The client won't have a way of knowing this.

apyrgio added a commit that referenced this issue Sep 13, 2023
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.

This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.

Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:

1. The timeout for getting the number of pages. This timeout takes into
   account:
   * the disposable qube startup time, and
   * the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
   that we do it on the server-side.

Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)

Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
  of the conversion. Once containers can stream data back to the
  application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
  be part of the error handling PR (see #430)

Fixes #446
Refs #443
Refs #430
@apyrgio
Copy link
Contributor

apyrgio commented Sep 19, 2023

Note that it's possible that one of our read functions may receive an early EOF, when its the process in the disp qube that has died. In that case, we should always check first the exit code of the process, and then raise the proper exception.

apyrgio added a commit that referenced this issue Sep 20, 2023
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.

This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.

Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:

1. The timeout for getting the number of pages. This timeout takes into
   account:
   * the disposable qube startup time, and
   * the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
   that we do it on the server-side.

Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)

Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
  of the conversion. Once containers can stream data back to the
  application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
  be part of the error handling PR (see #430)

Fixes #446
Refs #443
Refs #430
apyrgio added a commit that referenced this issue Sep 20, 2023
Extend the client-side capabilities of the Qubes isolation provider, by
adding client-side timeout logic.

This implementation brings the same logic that we used server-side to
the client, by taking into account the original file size and the number
of pages that the server returns.

Since the code does not have the exact same insight as the server has,
the calculated timeouts are in two places:

1. The timeout for getting the number of pages. This timeout takes into
   account:
   * the disposable qube startup time, and
   * the time it takes to convert a file type to PDF
2. The total timeout for converting the PDF into pixels, in the same way
   that we do it on the server-side.

Besides these changes, we also ensure that partial reads (e.g., due to
EOF) are detected (see exact=... argument)

Some things that are not resolved in this commit are:
* We have both client-side and server-side timeouts for the first phase
  of the conversion. Once containers can stream data back to the
  application (see #443), these server-side timeouts can be removed.
* We do not show a proper error message when a timeout occurs. This will
  be part of the error handling PR (see #430)

Fixes #446
Refs #443
Refs #430
apyrgio added a commit that referenced this issue Sep 25, 2023
Handle incomplete reads due to EOF by checking if the underlying command
has exited. If so, raise its exception.

Refs #430
apyrgio added a commit that referenced this issue Sep 25, 2023
Handle incomplete reads due to EOF by checking if the underlying command
has exited. If so, raise the corresponding exception.

Refs #430
@apyrgio apyrgio reopened this Sep 28, 2023
@apyrgio
Copy link
Contributor

apyrgio commented Sep 28, 2023

We were a bit overeager to close this issue, as there are still some error cases that are missing:

  • The disposable qube could not start because dz-dvm does not exist. (**Update: done in Generalize "out of RAM" error to reflect other issues #564)
    • Currently, we return a wrong error message here, that the system was out of RAM, but this is not the case.
    • It's probably best to return a generic message here "Could not start disp qube dz-dvm. Consult your system logs". Qubes UI shows a notification with more context, after all.
  • If we cannot start disposable qubes due to RAM issues, then the user will be bombarded with error messages. moved to Qubes: handle remaining error cases #567
    • We should check if it makes sense to stop after the first failed conversion, assuming that there are no transient issues with qrexec.
  • Show helpful message if OCR step fails (e.g., unsupported language)
    • Admittedly, this error is unlikely to happen, because we now install every OCR language available.
    • I propose moving this to the stabilization effort. We ended up fixing this in our beta phase.
  • Stop disposable VMs when aborting conversions (update: to be tackled separately Shut down dangling containers / VMs #563)
    • This is important and we need to fix it. I haven't managed to do so yet, sorry.
    • EDIT: This actually applies to containers as well. When we quit the Dangerzone window, we don't kill the container, so we should fix it properly.
  • Get notified about the command that failed while doing document conversion: moved to Qubes: handle remaining error cases #567
    • In practice, we have at most three commands that may run:

      1. [not required for PDFs] libreoffice , gm
      2. pdfinfo
      3. pdftoppm

      An error during pdftoppm is easy to detect, as we are in the middle of getting pages back. Errors between pdfinfo and libreoffice | gm are less easy to detect, so this still warrants a fix, although it's not very pressing. I'd move this to the stabilization effort as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P:Qubes QubesOS integration
Projects
None yet
2 participants