Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce plotting.backend configuration with Plotly support #1639

Merged
merged 24 commits into from
Jul 14, 2020

Conversation

DumbMachine
Copy link
Contributor

@DumbMachine DumbMachine commented Jul 8, 2020

Aims to fix #1626
Each backend returns the figure in their own format, allowing for further editing or customization if required.

How to use?:

import databricks.koalas as ks

kdf = ks.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8]], columns=["A", "B", "C", "D"])
kdf.plot(title="Example Figure") # defaults to backend="matplotlib"

image

kdf.plot(backend="pandas_bokeh", title="Example Figure")
## same as:
# ks.options.plotting.backend = "pandas_bokeh"
# kdf.plot(title="Example Figure")

image

fig = kdf.plot(backend="plotly", title="Example Figure", height=500, width=500)
fig.show()

image

# further edits can be made to the figure
fig.update_layout(template="plotly_dark")
fig.show()

image

@ueshin
Copy link
Collaborator

ueshin commented Jul 8, 2020

@DumbMachine What version of black are you using? The format seems something wrong.
ah, seems like now black check passed.

@DumbMachine
Copy link
Contributor Author

I have the version required by requirements-dev.txt

(base) ➜  koalas git:(plotly-support) ✗ black --version                        
black, version 19.10b0

@@ -220,6 +220,12 @@ def validate(self, v: Any) -> None:
"'plotting.sample_ratio' should be 1.0 >= value >= 0.0.",
),
),
Option(
key="plotting.backend",
doc=("Backend to use for plotting. Default is matplotlib."),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a list of currently supported backends?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check_func to avoid setting invalid backend?

@DumbMachine
Copy link
Contributor Author

I have some trouble understanding the error here. Some tests have problems with trailing ,
and at the same time, some tests recommend to run dev/reformat which has the following effect (adds the trailing ,):
image
(the image is the result of :git diff plot.py, after running dev/reformat)

@ueshin
Copy link
Collaborator

ueshin commented Jul 8, 2020

Let me try to fix it.

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black seems to autodetect the python version as per each file, so the files can't include new syntax.

@ueshin
Copy link
Collaborator

ueshin commented Jul 8, 2020

Btw, I'm wondering how the other backends, pandas_bokeh and pandas_altair?
I tried pandas_bokeh locally, but it shows the same figure as matplotlib. Am I missing something?

@HyukjinKwon
Copy link
Member

Awesome! @DumbMachine

"""
# function copied from pandas.plotting._core

import pkg_resources # Delay import for performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this comment to the above of import pkg_resources to keep the consistency with other comments if there is no special reason ?

_backends[entry_point.name] = entry_point.load()

try:
return _backends[backend]
Copy link
Contributor

@itholic itholic Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if "backend" is in "_backends" and raise an KeyError manually rather than using try ??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the try because the _backends initially does not have any modules. _backends gets modules assigned to it when the modules are loaded, upon the user's request of the plotting module. The use of:

        return _backends[backend]

is that if the use has the module already saved in the _backends dict, there will be no need for reimporting the package. The image below will example better:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DumbMachine Thanks for the explanation. let's keep this as it as

@@ -1507,6 +1615,37 @@ class KoalasFramePlotMethods(PandasObject):
``df.plot(kind='hist')`` is equivalent to ``df.plot.hist()``
"""

@staticmethod
def _get_args_map(backend_name, data, kind, kwargs):
"""Appropriate call args and data preprocessing mapping for the backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a simple example for this methods ?

@itholic
Copy link
Contributor

itholic commented Jul 9, 2020

@DumbMachine Thanks for the work for this !! 👍

@DumbMachine
Copy link
Contributor Author

The tests fail I believe due to plotly not being installed in the testing environment. What should be done here?

@ueshin
Copy link
Collaborator

ueshin commented Jul 9, 2020

@DumbMachine you can add the libraries you need to the Test category in requirements-dev.txt file.

@DumbMachine
Copy link
Contributor Author

The tests that are failing, fail due to lack of pandas_bokeh. I did add the package to test requirements.

@ueshin
Copy link
Collaborator

ueshin commented Jul 9, 2020

I'll fix the conda builds.

@ueshin
Copy link
Collaborator

ueshin commented Jul 9, 2020

ah, pandas_bokeh is not in conda-forge. maybe we should remove the test with it.

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM. Awesome work!
@HyukjinKwon @itholic Could you take another look?

@DumbMachine
Copy link
Contributor Author

Made the changes and all types of arguments are now supported. Please test it out.
image

image

Comment on lines 1281 to 1284
# Values not being same as default implies user is explicitly passing the arguments
for arg, def_val in positional_args:
if arg in kwargs and kwargs[arg] == def_val:
kwargs.pop(arg, None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should apply this before Line 1274, otherwise ploty specific arguments could have different values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes. Now we try to map if possible and remove arguements if default.

@ueshin
Copy link
Collaborator

ueshin commented Jul 10, 2020

@DumbMachine Can I ask you another favor while we are here?
Could you update the PR description to have some simple example codes to use the plotting backend and the result figures?

e.g.,:

kdf = ...
kdf.plot(...)

figure

ks.option.plotting.backend = ...
kdf.plot(...)

figure

...

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too.

@HyukjinKwon HyukjinKwon changed the title Plotly support Introduce plotting.backend configuration with Plotly support Jul 14, 2020
@HyukjinKwon HyukjinKwon merged commit 6361053 into databricks:master Jul 14, 2020
@HyukjinKwon
Copy link
Member

I will double check by myself tomorrow and make a followup to clean up.
Thank you @DumbMachine for working on this!

@itholic
Copy link
Contributor

itholic commented Jul 16, 2020

Sorry for the late.
LGTM, either.
Thanks for the work on this, @DumbMachine 👍

@DumbMachine
Copy link
Contributor Author

Always happy to help.
Is there a reason to explicitly have two different plotter KoalasFramePlotMethods and KoalasSeriesPlotMethods ? Most of the documentation and function calls are same for both plotters. I believe this could be simplified having a single class (similar to PlotAccessor from pandas) as diff functions to plot already exist (plot_series and plot_frame).

@ueshin
Copy link
Collaborator

ueshin commented Jul 16, 2020

@DumbMachine I'm not sure about the original motivation for having two plotter classes, but it'd be great if we could simplify the plotters. Would you like to work on it? cc @HyukjinKwon

@DumbMachine
Copy link
Contributor Author

DumbMachine commented Jul 17, 2020 via email

@HyukjinKwon
Copy link
Member

Sure, nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plotly on koalas dataframes
4 participants