Allow for pymc native samplers to resume sampling from `ZarrTrace` #7687

lucianopaz · 2025-02-21T10:23:02Z

Description

Big PR approaching! This finishes adding the ability of pymc native step methods to resume sampling from an existing trace (as long as it's a ZarrTrace!). This means that you can now continue tuning or sampling from a pre-existing sample run. For example

with model:
    # First tuning run
    pm.sample(tune=400, draws=0, trace=trace)

    # Do whatever to decide if you want to continue tuning   
    pm.sample(tune=800, draws=0, trace=trace)

    # Switch to sampling
    pm.sample(tune=800, draws=1000, trace=trace)

Another thing is that the chunks_per_draw from ZarrTrace along with its persistent storage backends (like ZipStore or DirectoryStore) makes the sampling store the results and final sampling state periodically, so in case of a crash during sampling, you can use the existing store to load the trace using ZarrTrace.from_store and then resume sampling from there.

The only thing that I haven't tested for yet is to add an Op that makes pm.sample crash to see if I can reload the partial results from the store and resume sampling. @ricardoV94 gave me some pointers to that, but I won't be working on this for the rest of the month and I thought it best to open a draft PR to kick off any discussion you have or collect feedback

Related Issue

Closes ENH: Add checkpoints during sampling #7503
Related to #

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

📚 Documentation preview 📚: https://pymc--7687.org.readthedocs.build/en/7687/

…ng values

fonnesbeck · 2025-02-21T21:08:11Z

pymc/backends/__init__.py

+                vars=trace_vars,
+                test_point=initial_point,
+            )
+        except TraceAlreadyInitialized:


Maybe just InitializedTrace? Seems a little verbose!

Sounds fine to me, it's an internal thing

ricardoV94 · 2025-02-25T08:20:39Z

pymc/sampling/mcmc.py

+            if isinstance(trace, ZarrChain):
+                progress_manager.set_initial_state(*trace.completed_draws_and_divergences())
+                progress_manager._progress.update(
+                    progress_manager.tasks[i],
+                    draws=progress_manager.completed_draws
+                    if progress_manager.combined_progress
+                    else progress_manager.draws,
+                    divergences=progress_manager.divergences,
+                    refresh=True,


I still don't like this abstraction leaking elsewhere, just provide a default to the Ndarray backend that makes it work for either method. In that case I suppose start everything at zero

ricardoV94 · 2025-02-25T08:23:26Z

pymc/sampling/mcmc.py

    if isinstance(trace, ZarrChain):
        trace.link_stepper(step)
+        stored_draw_idx = trace._sampling_state.draw_idx[chain]


Same here all this logic including the old link_stepper can have a sensible default in the base trace class so you don't need to worry about what kind of trace you have here. Just make link_stepper a no op and stored_draw_idx to be zero by default?

ricardoV94 · 2025-02-25T08:24:16Z

pymc/sampling/parallel.py

+            if stored_draw_idx > 0:
+                if stored_sampling_state is not None:
+                    self._step_method.sampling_state = stored_sampling_state
+                else:
+                    raise RuntimeError(
+                        "Cannot use the supplied ZarrTrace to restart sampling because "
+                        "it has no sampling_state information stored. You will have to "
+                        "resample from scratch."
+                    )
+                draw = stored_draw_idx
+                self._write_point(trace.get_mcmc_point())


Duplicated logic, should be a property of the backend object?

ricardoV94 · 2025-02-25T08:24:28Z

pymc/sampling/parallel.py

@@ -491,6 +509,10 @@ def __init__(
            progressbar=progressbar,
            progressbar_theme=progressbar_theme,
        )
+        if self.zarr_recording:


abstraction leaking

ricardoV94

I like the new functionality, I am deeply against all the if isinstance(..., ZarrTrace) in the codebase. Either our code is supposed to allow different trace backends or it is not, this suggests you want to drop the Ndarray altogether, which fine if you do.

Otherwise all these cases seem like they could be handled by the BaseTrace having sensible default for these methods. We used to have continuation of traces in the past with Ndarray, I don't see anything that fundamentally needs ZarrTrace other than dev interest in it? So just make it raise NotImplementedErrors or make them no-ops and adjust the external code appropriately

I stopped half-way so it was not an extensive review. I think this is a bigger design point that needs decision before settling on the details of the PR.

lucianopaz added 11 commits February 21, 2025 11:12

Make WithSamplingState generic

e930f95

Disambiguate DEMetropolis tune and tune_target

fc972d3

QuadPotential initial values were never supposed to be frozen

04384fb

WithSamplingState now checks for dataclass compatibility before setti…

1f1b922

…ng values

Refactor ZarrTrace to write directly to warmup and final groups

6eb4787

Add mcmc_point to ZarrTrace.root._sampling_state

cc0bc63

Add ZarrTrace.resize

d6d4928

Add ZarrTrace.from_store

29f9f12

Add ZarrTrace compatibility checks

e81cd06

Resume sampling from existing ZarrTrace

0ab1596

Update progressbar managers with existing fit results from ZarrTrace

61613a7

lucianopaz added enhancements trace-backend Traces and ArviZ stuff major Include in major changes release notes section labels Feb 21, 2025

github-actions bot added feature request request discussion samplers labels Feb 21, 2025

Fix typing issue

1b0a162

lucianopaz requested review from ricardoV94 and aseyboldt February 21, 2025 10:28

lucianopaz changed the title ~~Zarr continue~~ Allow for pymc native samplers to resume sampling from ZarrTrace Feb 21, 2025

lucianopaz mentioned this pull request Feb 21, 2025

Add support for zarr trace backend pymc-devs/nutpie#171

Open

fonnesbeck reviewed Feb 21, 2025

View reviewed changes

ricardoV94 reviewed Feb 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for pymc native samplers to resume sampling from `ZarrTrace` #7687

Allow for pymc native samplers to resume sampling from `ZarrTrace` #7687

lucianopaz commented Feb 21, 2025 •

edited by github-actions bot

Loading

fonnesbeck Feb 21, 2025

ricardoV94 Feb 25, 2025

ricardoV94 Feb 25, 2025

ricardoV94 Feb 25, 2025

ricardoV94 Feb 25, 2025

ricardoV94 Feb 25, 2025

ricardoV94 left a comment •

edited

Loading

Allow for pymc native samplers to resume sampling from ZarrTrace #7687

Are you sure you want to change the base?

Allow for pymc native samplers to resume sampling from ZarrTrace #7687

Conversation

lucianopaz commented Feb 21, 2025 • edited by github-actions bot Loading

Description

Related Issue

Checklist

Type of change

fonnesbeck Feb 21, 2025

Choose a reason for hiding this comment

ricardoV94 Feb 25, 2025

Choose a reason for hiding this comment

ricardoV94 Feb 25, 2025

Choose a reason for hiding this comment

ricardoV94 Feb 25, 2025

Choose a reason for hiding this comment

ricardoV94 Feb 25, 2025

Choose a reason for hiding this comment

ricardoV94 Feb 25, 2025

Choose a reason for hiding this comment

ricardoV94 left a comment • edited Loading

Choose a reason for hiding this comment

Allow for pymc native samplers to resume sampling from `ZarrTrace` #7687

Allow for pymc native samplers to resume sampling from `ZarrTrace` #7687

lucianopaz commented Feb 21, 2025 •

edited by github-actions bot

Loading

ricardoV94 left a comment •

edited

Loading