Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple Dash or Flask apps for "du.configure_upload". #39

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

cainmagi
Copy link
Contributor

@cainmagi cainmagi commented May 3, 2021

Introduction

This PR preserves all existed APIs. In other words, the modifications are compatible with previous APIs.

This PR is designed for providing the following new API:

du.configure_upload(app, UPLOAD_FOLDER_ROOT, upload_api='my-awesome-API/upload')
du.configure_upload(app, UPLOAD_FOLDER_ROOT, upload_api='my-awesome-API/upload-second', upload_component_ids='second')

where we have:

  1. du.configure_upload could be used several times.
  2. The argument app could be a dash.Dash instance or a flask.Flask instance. The argument type would be checked during the configuration.
  3. A new argument upload_component_ids is provided, it could be
    • None: This is the default value. It means that this du.configure_upload is designed for default configurations.
    • str: This is the same as (str, )
    • tuple or list: A sequence of str. Each str is a component id. The component with the provided id would be configured by the provided configurations.
  4. A simple check has been implemented for upload_component_ids. If it is not None, each item requires to be a non-empty str. And each item should be unique, i.e. repeating configurations for the same component is not allowed.

Example

For dash

import dash
import dash_html_components as html
from dash.dependencies import Output
import dash_uploader as du

app = dash.Dash(
    __name__,
    assets_folder='./assets',
    assets_url_path='/assets',
    assets_ignore=r'.*.js'
)
app.title = 'A testing app.'
du.configure_upload(app, folder='upload', use_upload_id=False)
du.configure_upload(app, folder='upload', use_upload_id=False, upload_api='/api-uploader', upload_component_ids='uploader')

app.layout = html.Div(
    children=[
        du.Upload(
            id='uploader',
            text='Drag & Drop or Select a File',
            filetypes=['csv', 'zip'],
            upload_id='.',
            max_files=1,
            default_style={
                'width': '100%',
                'paddingLeft': '.5rem',
                'paddingRight': '.5rem',
                'minHeight': 'min-content',
                'lineHeight': '50px',
                'borderWidth': '1px',
                'borderStyle': 'dashed',
                'borderRadius': '5px',
                'textAlign': 'center'
            }
        ),
        html.Div(id='trigger-upload')
    ]
)


@du.callback(
    output=Output('trigger-upload', 'children'),
    id='uploader',
)
def trigger_upload(filenames):
    return html.Ul([html.Li(filenames)])


@app.server.after_request
def add_header(response):  # pylint: disable=unused-variable
    response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
    response.headers["Pragma"] = "no-cache"
    response.headers["Expires"] = "0"
    response.headers['Cache-Control'] = 'public, max-age=0'
    response.cache_control.no_cache = True
    return response


if __name__ == '__main__':
    import colorama

    colorama.init()
    app.run_server(host='localhost', port=8060, debug=False)

Users could change the upload_component_ids in the second du.configure_upload and check what happens.

For Flask

import os
import flask

import dash_uploader as du

app = flask.Flask('test-uploader')

os.makedirs('upload', exist_ok=True)
du.configure_upload(app, folder='upload', upload_api='/dash-uploader', use_upload_id=False)
du.configure_upload(app, folder='upload', use_upload_id=False, upload_api='/uploader2', upload_component_ids='uploader2')

if __name__ == '__main__':
    import colorama

    colorama.init()
    app.run(host='localhost', port=8061, debug=False)

In this case, actually, we do not need to configure upload_component_ids, because a Flask app does not have components. It should be only used for managing APIs.

For the next step

  1. This PR is required for implementing the next feature (cross-domain) supports. I would not create the next PR before this PR is merged or closed.
  2. If this PR is acceptable, I could start working on the documentation before merging it to the master branch.
  3. We need to consider the following two topics:
    1. Currently, I add a check for the du.callback. If the component is not configured for a dash app (not configured or configured for a flask app), the du.callback would be forbidden. However, it is possible to implement a "Flask callback". We could invoke the callback by using the request handler after the final chunk uploaded. I am wondering whether we need to implement this feature (maybe this feature should be split into another PR).
    2. If we decide to implement the "Flask callback", there is an important problem. Because Flask app does not maintain the component ids, we may have two options for the implementation: (1) Add the component id in the arguments of the request. Then the upload_component_ids should be preserved because the Flask app could identify the component id. (2) Each "Flask callback" is designed for an API. In this case, we only need to add a check for du.configure_upload and forbid users to set upload_component_ids when app is a Flask app.

Update report

  1. Change the format of du.settings.
  2. Move update_upload_api from du.upload to du.configure_upload.
  3. Add an option upload_component_ids to du.configure_upload. This enables users to set different configurations for different components.
  4. Support Flask apps for du.configure_upload.

cainmagi added 3 commits May 3, 2021 09:54
The user configurations are moved to a dict. It is designed for supporting multiple du.configure_upload.
1. Change the format of du.settings.
2. Move update_upload_api from du.upload to du.configure_upload.
3. Add an option "upload_component_ids" to du.configure_upload. This enables users to set different configurations for different components.
4. Support for flask app for du.configure_upload.
@cainmagi cainmagi changed the title Support multiple Dash or Flask apps for du.configure_upload. Support multiple Dash or Flask apps for "du.configure_upload". May 3, 2021
@fohrloop
Copy link
Owner

fohrloop commented May 3, 2021

Thanks for the detailed PR! I'm happy to see this progressing. One thing that I'm considered that there might be some side effects that I cannot foresee, and the complexity of the things this package can do is increasing. This can lead into difficulties in maintaining the dash-uploader in the future. What I would suggest, is that there will be some automatic tests added to all the new features. This way we can see that nothing will break when something is changed.

@cainmagi
Copy link
Contributor Author

cainmagi commented May 3, 2021

I have to admit that this PR is large even though the API changes are very small. At least, I could confirm that the two examples I provide (as well as the usage.py in the root folder) could work well.

For the automatic tests stuff, should it be related to the other issue #34? I am guessing that you want to use pytest, but currently, there are no available testing codes in ./test.

I know that pytest could be integrated with Github Actions, and I have tried to do that before. I know how to write testing codes for dash (although I am not an expert on that), but I find that it is difficult to set up Chrome drivers for Github Actions. I guess that a testing container may be required before setting up Github Actions. I am still learning how to do it now. If you do not mind my poor code style, and only need to perform tests locally, I could try to write some testing codes and start another PR.

@fohrloop
Copy link
Owner

fohrloop commented May 4, 2021

I would be delighted to see a PR for some basic testing! :) No need to add GitHub Actions etc. at this stage, just bare minimum to get testing started. This is exactly what issue #34 is about.

One thing what I'm now thinking that the du.configure_upload does a bit different thing on Flask and on Dash apps, right? I know that you had the flask part in a separate function in a previous PR, but I'm now thinking (just initially) that if it would make more sense for the user that it would be different function (if it does a different thing). Something like du.configure_flask. The du.configure_upload could call it internally to prevent duplicate code. But this is just initial thinking. I really should take time to look through the additions in this PR and try to understand the implications from all the possible angles.

One thing I'm thinking is that what is the motivation behind supporting multiple calls to du.configure_upload, if there is still no full support for multiple du.Upload components on same app? (referring to #35) Or is there some chicken-and-egg -problem here?

@cainmagi
Copy link
Contributor Author

cainmagi commented May 4, 2021

Please do not try to merge it now. We may need to change the API according to the previous discussion.

If we discard the implementation based on multiple calls to du.configure_upload, we would need to add an argument like service_addr or upload_api to du.Upload, and make the upload_api configured in du.configure_upload become a default value. We need to do that because we need to support cross-domain uploading based on this PR in the next step.

If you think the implementation of multiple calls to du.configure_upload is preferred. I think we may do not need to split du.configure_flask from it. Because in the next step, we may need to provide a du.configure_remote_upload like this

du.configure_remote_upload(app, remote_addr, upload_component_ids)

because an uploader directed to the remote server only needs to handle the callback. The uploading folder, API, and requests are handled by the remote side.

I have realized that in my case, I do not need to use du.configure_upload or something like du.configure_flask multiple times, because it is not necessary to configure multiple different APIs for the same app, and it is not possible to serve multiple different apps. But I need to do something like:

du.configure_upload(app, ...)
du.configure_remote_upload(app, remote_addr_1, upload_component_ids=...)
du.configure_remote_upload(app, remote_addr_2, upload_component_ids=...)
...

For the remote side, I only need to configure something like this

du.configure_upload(flask_app, ...)

So serving multiple calls of du.configure_upload may not be a good choice. But we need to serve multiple calls to something like du.configure_remote_upload.

How do you think of that? If the multiple calls to du.configure_upload is not preferred, I could start another PR and close this one. Before that, we may also need to decide whether it is necessary to split du.configure_flask.

@fohrloop fohrloop changed the base branch from master to dev May 4, 2021 21:06
@fohrloop
Copy link
Owner

fohrloop commented May 4, 2021

This clearly needs some more thinking with better good night's sleep. Here are some fast notes. These are not necessarily answering to anything, but they can help us later when designing the future API & functionality of dash-uploader.

Dash

The Dash instance will have

  • url_base_pathname, that will default to /. This is not now used in dash-uploader as the following two give more fine-grained control
  • request_pathname_prefix. It is the prefix for AJAX calls from client (browser). It must end with routes_pathname_prefix. This defaults to url_base_pathname if not given.
  • routes_pathname_prefix. It is the prefix for API routes on backend (flask).

There values are read from the dash app instance and saved to du.settings, when du.configure_upload is called.

du.configure_upload

saves some settings

  • Reads the request_pathname_prefix and routes_pathname_prefix and saves it to du.settings
  • Saves app to du.settings.app. This needs to be saved for the du.callback to function.
  • Saves the UPLOAD_FOLDER_ROOT to du.settings.

opens routes

  • Opens route for <routes_pathname_prefix>/<upload_api> or just <upload_api>, if routes_pathname_prefix is missing
  • Also sets the handling of the HTTP GET and POST requests for the routes
  • The backend could be using dash or flask (as dash is essentially using flask).

du.Upload

  • Gives the address for the AJAX calls to the JS component. This will be <request_pathname_prefix>/<upload_api> or just <upload_api>, if request_pathname_prefix is missing.
  • The configuration is read under the hood from du.settings (which are saved when calling du.configure_upload)

HttpRequestHandler

  • Knows how to handle GET and POST requests.
  • Does not need to know anything about du.settings or the app sending the requests(?). (Possible problem: How to validate users in multi-server case? Would it be better to route the data through the dash app anyway?)

@cainmagi
Copy link
Contributor Author

cainmagi commented May 5, 2021

Thank you for replying to me. Currently, I tend to drop the implementation of this PR, because I think we may not need the multiple calls for du.configure_upload. I may continue to work on this project in the next week. Before that, I want to know how you think of du.configure_upload. Even we drop the multiple-call-based implementation, we could still make the du.configure_upload supports the Flask app.

Authentication

First, let me explain my view about authentication. Currently, I have to acknowledge that I have not considered this problem. To my knowledge, the authentication could be maintained in a very complicated (but also very safe) way or a very simple way.

  • For a Flask app, the most ideal authentication is token-based (see here). To implement it, we need to modify the requesting handler and add APIs for handling tokens.
  • However, there is also an example for simple (but not safe) authentication, dash_auth.BasicAuth. I have looked at the source codes, and find that it may be not very difficult to modify it as a Flask app version. If adopt a similar approach, the authentication could be provided by an independent package, and we would not need to modify the request handler.

But an important problem is that the authentication is an optional feature for dash. We could not expect that each dash app is equipped with authentication, and we may not know how do the users apply the authentication. For the Flask case, we may have a similar situation. If we want to make the authentication a built-in feature of dash-uploader for multi-services, we may need to add the following features for the users:

  • Provide a function for configuring valid users and passwords, like dash_auth.BasicAuth.
  • Provide a flask API like /login, the GET/POST method would accept a user name and hashed password, and return a random-generated token. The user name and password may be able to be automatically extracted from the dash app. This flask API could be also used for validating whether the remote service is online.
  • For each uploading, the token is required to be added in the header of the request. The request handler would validate the token and determine whether to accept the uploading files. Because the token may need to be destroyed periodically. If the validation fails, the request side may need to send requests to /login again. This step may need to be automatic.

Because I am not a professional web developer, I am not sure whether these steps are widely acceptable. I know that there are some plugins like flask-login or flask-jwt that may make it easier to develop the authentication feature if you do not mind adding them into the dependencies. But in my opinion, providing the authentication feature maybe not necessary, because users could implement the authentication by themselves and use your HttpRequestHandler to add customized codes for the authentication.

For this PR

Before the next step, I think we may need to determine the one thing:

  • Do we need multiple calls for du.configure_upload? My project only requires one configure for either the dash app or the flask app. We may only benefit from the multiple calls for configuring different uploading folders and different uploading APIs. But I think in most cases, we do not need this feature.

For the remote service case, we may also need to determine the functionalities. But I think we could discuss it in the next PR about multi-services.

@fohrloop
Copy link
Owner

fohrloop commented May 5, 2021

I was thinking that I would have had more time today to think this through, but unfortunately that did not happen. I wish I had few hours just to think on this, so I could give my opinion. I appreciate your time and contribution.

Authentication

For the authentication case, dash-uploader was first released without any possibility (without changing the source code) of having any kind of authentication. Now there is HttpRequestHandler that I made partly for the support for authentication. I have not much experience on the topic myself either, and as you said, most dash apps are without authentication. I would recommend anyone using dash-uploader or similar to use some sort of authentication and validation for the inputs, and perhaps some upload rate limits per user. I have a feeling that adding authentication to dash-uploader might be a bit complicated, as there might be very different authentication schemes people use. So for now, I think it can be left as the responsibility of the user to subclass the HttpRequestHandler and add their own logic. In the future, perhaps there could be some examples or helper functions, though.

Of course, the dash-uploader should be designed in such a way that authentication is possible (and relatively easy). So the topic is good to be kept in mind.

About this PR

Yeah that is a valid point, do we need multiple calls to du.configure_upload. Actually, what I was thinking what would be really helpful as the next step, is to make a list of requirements / wishes on what we really wish to accomplish, and how the user of dash-uploader would use the features. In my opinion, having multiple dash-uploader components on same web app would be "nice to have", but I think I remember only one person asking about if it could be made possible. The React component should be only ever sending AJAX calls to one address, that is for sure. But, should it be possible to send them directly cross-domain (to a flask server?). Or, would it be better to make the Javascript always talk with the "main dash app", so that authentication is easy, and then "pipe" the calls forward to somewhere else, if needed?

Here is a simple diagram which explains what I mean:
image

Some random thoughts

  • If the requests would be sent from the browser directly to some other place than the main Dash app, then it would not be possible (or it would be really difficult) to have callbacks that will do something visible to the user (to browser). Or would it?
  • If requests are sent through Dash app, they are "quite easy" to be authenticated
  • There might be possibility to extend dash-uploader to support uploading to some 3rd party service like AWS S3 or Dropbox. This should be even now possible with custom HttpRequestHandler.
  • Configuring callbacks (browser callbacks) needs the du.configure_upload to be called. This means that if there would be multiple du.Upload components on same app, there should be one call to du.configure_upload per component.

@cainmagi
Copy link
Contributor Author

Hello! I am back for this project now. First, please let me explain my understanding of your thoughts:

  • Indeed, only dash could handle the callbacks. As I have proposed before, the locally fired callback function needs to send a request to the remote service, and use the response of the service to provide the output value of the local callback. To implement this workflow. We need to write codes for sending requests (by urllib3 or requests) from the dashboard side, and codes for handling the requests (by Flask) from the remote side. I have written some codes for remote forwarding, and successfully worked out a dashboard connected with a remote service. Unfortunately, these codes need to be kept secret by my organization. However, I could still start another PR for providing remote services and explain the idea in detail. Since the mechanism is a little bit complicated, I am wondering whether it should be integrated as a part of dash-uploader (maybe it is better to let users implement their own APIs?). We could discuss this topic in that PR.
  • Yes. Actually, the page served by dash is usually equipped with the basic authentication mentioned above. In my opinion, if we need to connect to an existing remote service. It should provide some kinds of authentication (if needed). In most cases, what we need to do is to integrate the provided authentication with dash-uploader, so I think your HttpRequestHandler is already a good enough solution. (In my case, I need to write the service from scratch.)
  • I am not familiar with those 3rd party services like Dropbox, but I have written some codes about Github API v3 before. In my experience, HttpRequestHandler may not be suitable in this case. Because we could not hack into the server-side codes of those services. The only thing we could do is to use the APIs, which may be quite different from the APIs of resumablejs. It seems to be more practical to call the 3rd party APIs after the file gets uploaded to the dashboard. This process could be finished in the callback.
  • For the du.configure_upload, I am a little bit confused. If we need to configure du.configure_upload for each component, why not merging it with du.Upload together? I think it may look more reasonable to modify the APIs like this:
    du.configure_upload(app=app)
    ....
    du.Upload(
        id='upload'
        ...,
        upload_api=...,
        upload_folder=...,
    )
    The du.configure_upload should only be used for maintaining the app. It is not necessary to save something like routes_pathname_prefix, because we could get the value by settings.app.config.get('routes_pathname_prefix', ...).
    Your reply about this point would give me ideas for the next step of this PR. Thank you!

@fohrloop
Copy link
Owner

Hi @cainmagi ,

If I understood this correctly, this

As I have proposed before, the locally fired callback function needs to send a request to the remote service, and use the response of the service to provide the output value of the local callback. To implement this workflow. We need to write codes for sending requests (by urllib3 or requests) from the dashboard side, and codes for handling the requests (by Flask) from the remote side

..says that you were also about to pipe all the AJAX calls (from browser) through the Dash app to your Flask app (and not sending from browser directly to Flask)?

Thank you for the question! I would really much like to simplify the user experience by using du.Upload independently from du.configure_upload as you suggested. However, there are some restrictions that lead to the current dash-uploader API

  • The du.configure_upload configures the Flask routes that the dash app is using. I.e. it configures the server to receive HTTP requests at some endpoint(s). This route configuration needs to be done before the dash app is started. However, some du.Upload components might not have been created when the app is started. This is the reason that the upload_api must be in the du.configure_upload as argument.
  • I am not entirely sure if the folder argument of the du.configure_upload is absolutely necessary to be there (or could it be moved to the du.Upload component). I should take a bit time to investigate this, but before I would do that, I should ask a question of "why" it would be beneficial if the du.Upload component would have the folder as argument, and not the du.configure_upload.

@fohrloop
Copy link
Owner

What I am basically saying is that one needs to have one call to du.configure_upload (to open a route in the Flask app and define output folder) for each du.Upload component.

  • Each du.Upload component should upload to different folder, or, if same folder is accepted for multiple du.Upload components, there should be some logic to make sure that if two files with same name are uploaded at same time with two different components, that there would be no clash between them. So, the easiest way would be just to enforce using different upload folders

@cainmagi
Copy link
Contributor Author

About request forwarding

Not really. Because

  1. We do not need to forward dash-uploader requests to the remote Flask service. Because with the cross-domain modifications, these requests have been already sent from the browser to the remote service directly. The dash app would not receive those requests in this case.
  2. Although the dash-uploader requests have been directed to the remote service, the /_dash-update-component request would still be sent to the local dash app, i.e. the callback would be still fired locally. However, we do not need to forward the /_dash-update-component either, because the remote service is a Flask app, it is not worth writing codes to forward the dash formatted requests and handle them.
  3. Instead, we send a different request in the fired callback function. This request directs to a totally customized API on the remote side. The request is not sent by the browser but sent by the callback function (using something like urllib3 or requests). Because this request is totally customized, it is easy to control what we want to do on the remote side. In this way, the request would serve as "a flask callback". Because it is always sent by the callback function, if the callback is not fired, the request would not be sent.

I think I should draw a figure in the next PR (about the cross-domain implementation) to make this workflow clearer.

About configure_upload

Thank you for your explanation. I find that I misunderstand your post again 😅. In the previous post, I suggest moving the folder and API in du.Upload because I think the du.configure_upload needs to be called multiple times. But now I find that it is not necessary to do that, Because

  1. If there are no cross-domain services, there is no need to provide multiple APIs. One API is enough for each app (either dash app or flask app).
  2. If there are cross-domain services, compared to the following style:
    du.configure_remote_upload(app, remote_addr_1, upload_component_ids=['id1', ...])
    du.Upload(id='id1', ...)
    I prefer the following style
    du.Upload(id='id1', service=remote_addr_1, ...)
    Then we do not need to provide multiple calls to du.configure_upload. Because each app only requires one configuration. For example, the remote services should be configured from the remote side, while the dash app only needs one configuration.
  3. But we could still modify the du.configure_upload and let it support the Flask app. This would be a prerequisite step for the cross-domain implementation.

If you think the above statements are OK, I could close this PR and start another PR for supporting the Flask app first. Thank you!

@fohrloop
Copy link
Owner

Thanks for the explanations! A diagram would surely help to understand the working principle much faster!

If we would use du.configure_upload for Flask apps the upload_api parameter documentation does not make sense as the requests_pathname_prefix is specific to Dash, I think. Not sure if it should be a separate du.configure_flask function or not. But that can be also decided after some initial implementation when we see how it goes.

If there are cross-domain services, compared to the following style:
du.configure_remote_upload(app, remote_addr_1, upload_component_ids=['id1', ...])
du.Upload(id='id1', ...)
I prefer the following style
du.Upload(id='id1', service=remote_addr_1, ...)

Do you think it would work just just du.Upload(id='id1', service=remote_addr_1, ...)? I mean, without any du.configure_remote_upload calls in the dash app? If so, that would be awesome! There probably needs to be at least du.configure_upload call in the app, even if there was only one Upload component that would send directly to a remote Flask app, because the callback functionality needs it. I mean, if you want to use du.callback. This should be something like

configure_upload(app, folder=None, use_upload_id=False, upload_api=None, http_request_handler=None)

But what if you have two du.Upload components, and you want another one to be configured to send to the main Dash app and another component to send directly to the remote Flask app?

@cainmagi
Copy link
Contributor Author

cainmagi commented May 11, 2021

About Flask

Yes, there some dash-special configurations like requests_pathname_prefix. However, these configurations are only coded in the function, and not exposed as arguments. My concern is, the dash app and the Flask app share the same API:

# If this is a dash app
du.configure_upload(app, folder=None, upload_api=None, use_upload_id=None, http_request_handler=None)
# If this is a flask app
du.configure_upload(app, folder=None, upload_api=None, use_upload_id=None, http_request_handler=None)

All of the arguments would be used for configuring the app in both cases. To reduce the repeats of the codes, we could add a check in the implementation of du.configure_upload and disable some dash features if the app is a Flask app.

About cross-domain service

Yes. I am quite sure that du.Upload(id='id1', service=remote_addr_1, ...) works, because I have already implemented it in my project before. The reason is that such a component does not need either a local upload folder or a local upload API. In this case, there may be a problem in du.callback. See the following line:
https://github.com/np-8/dash-uploader/blob/f76252936ffc456a00c3930c569a458af44acd40/dash_uploader/callbacks.py#L36

To make du.callback correct, we may need to add a check. My idea is to add a hard-coded state. If the property service is cross-domain, du.callback would not append the settings.UPLOAD_FOLDER_ROOT to the file name. Currently, I remove it in the callback function body.

Let me use your example to make a simple explanation:

  • Because there is a local component, we need to call du.configure_upload.
  • The first component (local uploading) would use the configurations in the settings to set the service. The second component (remote uploading) is configured by something like du.Upload(..., service='http://xx.xx.xx.xx:xxxx/api-upload'), the configurations in settings are not used.
  • Because du.callback only check the existence of settings.app, although we call du.configure_upload for the first component, actually we could use du.callback for both components.
  • The du.callback for the second component needs to follow the request-based workflow I mentioned before.

The only problem is that, in some cases, we do not need the first component, i.e. we may only have remote uploading components. Then we actually do not need to configure a local upload API. But I think we should still force users to call du.configure_upload. Because it would not cause any conflict, and it is necessary to register the dash app for du.callback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants