.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/advanced_features/caching.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_advanced_features_caching.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_advanced_features_caching.py:


.. _caching:

Caching algorithm outputs
=========================

Many algorithms implemented in gaitmap have a runtime of multiple seconds on larger datasets.
In the context of algorithm evaluation, for example when performing a cross-validation, algorithms are sometimes
repeatedly called on the same data and even with the same parameters.
In these cases, it can be helpful to cache results to ensure that you do not need to recalculate values.

The `joblib` Python package makes cashing extremely easy and you should read their
`guide <https://joblib.readthedocs.io/en/latest/memory.html>`__ first, before continuing with this example.
However, one of the caveats with joblib caching is that it only works on pure functions without side effects and should
not be used with methods.

Unfortunately, gaitmap is mostly object oriented and all the computational expensive things you might want to do are
hidden behind a method call.

Therefore, many gaitmap algorithms have caching built-in.
These algorithms support an additional keyword argument called `memory` in their init-function.
If you pass a `joblib.Memory` object to these, it will be used to cache the most time consuming function calls.
Note, that this will usually not cache all the calculations in a method, but only the ones that are considered worth
caching by the algorithm developer.

If you really want to cache the full method calls (on your own risk), see the last section of this example.

.. GENERATED FROM PYTHON SOURCE LINES 31-34

Example Pipeline
----------------
We will simply copy the stride segmentation example to have some data to work with.

.. GENERATED FROM PYTHON SOURCE LINES 34-42

.. code-block:: default

    from gaitmap.example_data import get_healthy_example_imu_data
    from gaitmap.utils.coordinate_conversion import convert_to_fbf

    data = get_healthy_example_imu_data().iloc[:2000]
    sampling_rate_hz = 204.8

    data = convert_to_fbf(data, left_like="left_", right_like="right_")


.. GENERATED FROM PYTHON SOURCE LINES 43-50

Creating the cash
-----------------
First we will create a memory instance for our cash.
We can use the same cash to cash the output of multiple algorithms.
The cash stays valid even after you restart Python, if you didn't delete the folder.

However, in this example, we will use a temp-directory that will be deleted at the end of the example.

.. GENERATED FROM PYTHON SOURCE LINES 50-58

.. code-block:: default

    from tempfile import TemporaryDirectory

    from joblib import Memory

    tmp_dir = TemporaryDirectory()
    # We will activate some more debug output for this example
    mem = Memory(tmp_dir.name, verbose=2)


.. GENERATED FROM PYTHON SOURCE LINES 59-62

Initialize algorithm
--------------------
We initialize our algorithm as normal, but pass the memory instance as an additional parameter.

.. GENERATED FROM PYTHON SOURCE LINES 62-66

.. code-block:: default

    from gaitmap.stride_segmentation import BarthDtw

    dtw = BarthDtw(memory=mem)


.. GENERATED FROM PYTHON SOURCE LINES 67-74

Calling cached methods
----------------------
The first time we call `segment` now, all calculation will run as normal, but the output of certain calculations will
be cached.
They are then reused when we call `segment` again with the same data and configuration.

Observe the print output to see what happens.

.. GENERATED FROM PYTHON SOURCE LINES 74-77

.. code-block:: default

    first_call_results = dtw.segment(data=data, sampling_rate_hz=204.8)
    first_call_stride_list = first_call_results.stride_list_.copy()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ________________________________________________________________________________
    [Memory] Calling gaitmap_mad.stride_segmentation.dtw._vendored_tslearn.subsequence_cost_matrix...
    subsequence_cost_matrix(array([[-0.23287, ..., -0.97019],
           ...,
           [-0.24131, ..., -0.99925]]), array([[ 0.000225, ...,  0.000064],
           ...,
           [-0.031871, ..., -0.001388]]))
    __________________________________________subsequence_cost_matrix - 0.0s, 0.0min
    ________________________________________________________________________________
    [Memory] Calling gaitmap_mad.stride_segmentation.dtw._vendored_tslearn.subsequence_cost_matrix...
    subsequence_cost_matrix(array([[-0.23287, ..., -0.97019],
           ...,
           [-0.24131, ..., -0.99925]]), array([[-0.000646, ..., -0.000169],
           ...,
           [-0.001773, ..., -0.090264]]))
    __________________________________________subsequence_cost_matrix - 0.0s, 0.0min


.. GENERATED FROM PYTHON SOURCE LINES 78-85

According to the debug output, two internal functions of `BarthDtw` are cached.
Each twice with different value inputs, because our data had two sensors.
It depends on the actual algorithm, which and how internal components are cached.

Independent of that if we call the method again, we can see in the debug output that the results of these methods
are now loaded from disk.
If we would use a larger dataset, we would see dramatic speed improvements.

.. GENERATED FROM PYTHON SOURCE LINES 85-88

.. code-block:: default

    second_call_results = dtw.segment(data=data, sampling_rate_hz=204.8)
    second_call_stride_list = second_call_results.stride_list_.copy()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [Memory]0.9s, 0.0min    : Loading subsequence_cost_matrix...
    [Memory]0.9s, 0.0min    : Loading subsequence_cost_matrix...


.. GENERATED FROM PYTHON SOURCE LINES 89-90

We can verify that the results are actually identical

.. GENERATED FROM PYTHON SOURCE LINES 90-92

.. code-block:: default

    first_call_stride_list["left_sensor"].equals(second_call_stride_list["left_sensor"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    True


.. GENERATED FROM PYTHON SOURCE LINES 93-102

Partially cached calls
----------------------
As you have seen before, `BarthDtw` caches its internal call to `subsequence_cost_matrix`.
This is only part of the processing.
This ensures that we can change some parameters while still making use of the some cached results.
As the cost-matrix only depends only on the template and the constrains, we can reuse the cash, if we change any other
parameter.

If we change the `max_cost` for example, only the stride detection part needs to be recalculated.

.. GENERATED FROM PYTHON SOURCE LINES 102-105

.. code-block:: default

    new_instance = BarthDtw(max_cost=5.0, memory=mem)
    new_instance.segment(data=data, sampling_rate_hz=204.8)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [Memory]2.0s, 0.0min    : Loading subsequence_cost_matrix...
    [Memory]2.0s, 0.0min    : Loading subsequence_cost_matrix...

    BarthDtw(conflict_resolution=True, find_matches_method='find_peaks', max_cost=5.0, max_match_length_s=3.0, max_signal_stretch_ms=None, max_template_stretch_ms=None, memory=Memory(location=/tmp/tmpj26mhm6f/joblib), min_match_length_s=0.6, resample_template=True, snap_to_min_axis='gyr_ml', snap_to_min_win_ms=300, template=BarthOriginalTemplate(scaling=FixedScaler(offset=0, scale=500.0), use_cols=None))


.. GENERATED FROM PYTHON SOURCE LINES 106-108

As you can see in the debug output, we loaded the results of `subsequence_cost_matrix`, but recalculated the second
step.

.. GENERATED FROM PYTHON SOURCE LINES 110-118

Some Note
---------
- Caching support will vary from algorithm to algorithm
- Caching supports multi-processing
- Do **not** use you cache as permanent storage of results. It is way too easy to delete it.
- If you try a lot of things with a lot of data, your cache can become really large.
- Clear your cache, before you do your final calculations for a publication!
- Make sure you add you cache dir to your ".gitignore" file.

.. GENERATED FROM PYTHON SOURCE LINES 121-129

Caching Full method calls
-------------------------
In some cases it might still be desirable to cache the entire output of an algorithm.
To do this safely you need to be aware of how cashing works under the hood.
The `Memory` class calculates a hash of all inputs to a function and stores a pickeled version of the results together
with this input-hast.
If the function is called again, the hash of the input is compared with hashes stored on the disk.
Depending on this, a cached result can be selected.

.. GENERATED FROM PYTHON SOURCE LINES 129-131

.. code-block:: default

    import joblib


.. GENERATED FROM PYTHON SOURCE LINES 132-133

We can calculate the hash of our algorithm.

.. GENERATED FROM PYTHON SOURCE LINES 133-135

.. code-block:: default

    joblib.hash(dtw)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    '1ca6757c44aade987dc5515b59d43a75'


.. GENERATED FROM PYTHON SOURCE LINES 136-137

If we recreate the object with the same parameters, the hash is identical.

.. GENERATED FROM PYTHON SOURCE LINES 137-139

.. code-block:: default

    joblib.hash(BarthDtw())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    '050b7eb3aa62e4807d8dface0085fb56'


.. GENERATED FROM PYTHON SOURCE LINES 140-141

The same is true for cloning

.. GENERATED FROM PYTHON SOURCE LINES 141-143

.. code-block:: default

    joblib.hash(dtw.clone())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    '457b73b21546c0919c15d0112e4c22e3'


.. GENERATED FROM PYTHON SOURCE LINES 144-145

However, if we change any parameters the hash of the object changes.

.. GENERATED FROM PYTHON SOURCE LINES 145-147

.. code-block:: default

    joblib.hash(BarthDtw(max_cost=100))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    'fe7b97e7289e8bea9c8d274a200e5692'


.. GENERATED FROM PYTHON SOURCE LINES 148-154

It is important to note that the hash always changes, if **any** of the attributes are modified, not just the ones
accessible through the init.
This means, if e.g. after you call `segment` and the algorithm object will have all results stored, the hash will
change.
The same will happen, if you add custom attributes to the instance.
The hash will change and the cache would be invalidated.

.. GENERATED FROM PYTHON SOURCE LINES 154-158

.. code-block:: default

    test_dtw = BarthDtw()
    test_dtw.custom_value = 4
    joblib.hash(test_dtw)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    '575ae6008b1330d505058c59ecf849b4'


.. GENERATED FROM PYTHON SOURCE LINES 159-168

This observation becomes an issue when caching class methods.
As python passes the class instance itself as the first argument to this method.
This means the input-hash used for caching will change whenever anything on the class instance changes, even if the
change might not affect the actual output of the method.
In many cases this is less of an issue with gaitmap, as we can reasonably assume that the main action method should
only depend on the params of an algorithm (`self.get_params()`) and the actual action method.

Therefore, we can cache action methods reliably when cloning the algorithm before hand and using a wrapper method.
Cloning the algorithm instance ensures that all instance data, except the params are reset.

.. GENERATED FROM PYTHON SOURCE LINES 168-181

.. code-block:: default


    def call_segment(algo, data, sampling_rate_hz):
        return algo.segment(data=data, sampling_rate_hz=sampling_rate_hz)


    # Cache the wrapper:
    cached_call_segment = Memory(tmp_dir.name, verbose=2).cache(call_segment)

    # Then we need to clone the algorithm every time we call the cached wrapper, to reset the params:
    reset_dtw = dtw.clone()
    results = cached_call_segment(reset_dtw, data, sampling_rate_hz)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ________________________________________________________________________________
    [Memory] Calling __main__--home-docs-checkouts-readthedocs.org-user_builds-gaitmap-checkouts-v2.6.0-examples-advanced_features-caching.call_segment...
    call_segment(BarthDtw(conflict_resolution=True, find_matches_method='find_peaks', max_cost=4.0, max_match_length_s=3.0, max_signal_stretch_ms=None, max_template_stretch_ms=None, memory=Memory(location=/tmp/tmpj26mhm6f/joblib), min_match_length_s=0.6, resample_template=True, snap_to_min_axis='gyr_ml', snap_to_min_win_ms=300, template=BarthOriginalTemplate(scaling=FixedScaler(offset=0, scale=500.0), use_cols=None)), 
             left_sensor                      ... right_sensor                        
                  acc_pa    acc_ml    acc_si  ...       gyr_pa      gyr_ml      gyr_si
    0.000000    0.880811  2.762208 -9.408650  ...    -0.323037   -0.084604   -0.025288
    0.004883    0.885007  2.746448 -9.465895  ...    -0.075961    0.035851    0.152090
    0.009766    0.865777  2.686106 -9.436033  ...    -0.200378   -0.206538   -0.028626
    0.014648    0.876128  2.771787 -9.403943  ...     0.347912   -0.075574   -0.390202
    0.019531    0.928267  2.682286 -9.393766  ...    -0.260534   -0.025164    0.093895
    ...              ...       ...       ...  ...          ...         ...         ...
    9.741211    0.445489  2.638538 -9.353027  ...  -327.681622 -627.092147  196.539542
    9.746094    0.679787  2.586746 -9.312401  ...  -234.217050 -538.949858   78.782630
    9.750977    0.828875  2.607384 -9.223829  ...  -225.952242 -385.386142   71.534917
    9.755859    0.743285  2.789532 -9.297119  ...  -123.250826 -209.382027  -25.810329
    9.760742    0.505652  2.906063 -9.214244  ...    -0.886364  -45.132010  -76.377843

    [2000 rows x 12 columns], 
    204.8)
    [Memory]0.0s, 0.0min    : Loading subsequence_cost_matrix...
    [Memory]0.0s, 0.0min    : Loading subsequence_cost_matrix...
    _____________________________________________________call_segment - 0.0s, 0.0min


.. GENERATED FROM PYTHON SOURCE LINES 182-183

On this first call, we can see that the cached call actually modified the `reset_dtw` object in place.

.. GENERATED FROM PYTHON SOURCE LINES 183-185

.. code-block:: default

    id(reset_dtw) == id(results)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    True


.. GENERATED FROM PYTHON SOURCE LINES 186-187

However, on the second call, it will return a copy (loaded from the cache)

.. GENERATED FROM PYTHON SOURCE LINES 187-191

.. code-block:: default

    reset_dtw = dtw.clone()
    results = cached_call_segment(reset_dtw, data, sampling_rate_hz)
    id(reset_dtw) == id(results)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [Memory]0.9s, 0.0min    : Loading call_segment...

    False


.. GENERATED FROM PYTHON SOURCE LINES 192-204

While it is possible to cache methods this way, this might be error prone.
The safest option (and remember, we are already in the unsafe territory), is to use a nested wrapper resolve potential
user errors.

In the general case you can use the recipe below.
It will always ensure that the algo object is cloned and will return a copy of the algorithm in any case.

.. warning::
   While this expected to work, cashing an entire algorithm object as return value can take **a lot** of storage space
   as it usually stores a copy of the input data.
   Whenever possible you should only return the parts of the result you are really interested inside the cached
   function.

.. GENERATED FROM PYTHON SOURCE LINES 204-253

.. code-block:: default


    def cached_call_method(_algo, _method_name: str, _memory: Memory, *args, **kwargs):
        """Call a method on the algo object and cache the output.

        Repeated calls to this function with the same algorithm and the same args, and kwargs, will return cached results
        saved on disk.

        .. warning ::
            This method will clone the algorithm object before calling the method.
            This ensures that the cache is not invalidated because of results stored on the object.

        Parameters
        ----------
        _algo
            The algorithm instance to use
        _method_name
            The name of the method to call
        _memory
            A instance of `joblib.memory` used for caching
        args
            Positional arguments passed to the called method
        kwargs
            Keyword arguments passed to the called method.

        Returns
        -------
        method_return
            The return value of the called methods either calculated or cached.

        See Also
        --------
        gaitmap.utils.caching.cached_call_action

        """

        def _call_method(_algo, _method_name, *args, **kwargs):
            return getattr(_algo, _method_name)(*args, **kwargs)

        _algo = _algo.clone()
        return _memory.cache(_call_method)(_algo, _method_name, *args, **kwargs)


    mem = Memory(tmp_dir.name, verbose=2)

    cached_result = cached_call_method(
        BarthDtw(), _method_name="segment", _memory=mem, data=data, sampling_rate_hz=sampling_rate_hz
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ________________________________________________________________________________
    [Memory] Calling __main__--home-docs-checkouts-readthedocs.org-user_builds-gaitmap-checkouts-v2.6.0-examples-advanced_features-caching.cached_call_method.<locals>._call_method...
    _call_method(BarthDtw(conflict_resolution=True, find_matches_method='find_peaks', max_cost=4.0, max_match_length_s=3.0, max_signal_stretch_ms=None, max_template_stretch_ms=None, memory=None, min_match_length_s=0.6, resample_template=True, snap_to_min_axis='gyr_ml', snap_to_min_win_ms=300, template=BarthOriginalTemplate(scaling=FixedScaler(offset=0, scale=500.0), use_cols=None)), 
    'segment', data=         left_sensor                      ... right_sensor                        
                  acc_pa    acc_ml    acc_si  ...       gyr_pa      gyr_ml      gyr_si
    0.000000    0.880811  2.762208 -9.408650  ...    -0.323037   -0.084604   -0.025288
    0.004883    0.885007  2.746448 -9.465895  ...    -0.075961    0.035851    0.152090
    0.009766    0.865777  2.686106 -9.436033  ...    -0.200378   -0.206538   -0.028626
    0.014648    0.876128  2.771787 -9.403943  ...     0.347912   -0.075574   -0.390202
    0.019531    0.928267  2.682286 -9.393766  ...    -0.260534   -0.025164    0.093895
    ...              ...       ...       ...  ...          ...         ...         ...
    9.741211    0.445489  2.638538 -9.353027  ...  -327.681622 -627.092147  196.539542
    9.746094    0.679787  2.586746 -9.312401  ...  -234.217050 -538.949858   78.782630
    9.750977    0.828875  2.607384 -9.223829  ...  -225.952242 -385.386142   71.534917
    9.755859    0.743285  2.789532 -9.297119  ...  -123.250826 -209.382027  -25.810329
    9.760742    0.505652  2.906063 -9.214244  ...    -0.886364  -45.132010  -76.377843

    [2000 rows x 12 columns], sampling_rate_hz=204.8)
    ______________________________________________________call_method - 0.0s, 0.0min


.. GENERATED FROM PYTHON SOURCE LINES 254-255

And the second call will load the results.

.. GENERATED FROM PYTHON SOURCE LINES 255-259

.. code-block:: default

    cached_result = cached_call_method(
        BarthDtw(), _method_name="segment", _memory=mem, data=data, sampling_rate_hz=sampling_rate_hz
    )


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [Memory]0.4s, 0.0min    : Loading _call_method...


.. GENERATED FROM PYTHON SOURCE LINES 260-261

Finally remove the tempdir

.. GENERATED FROM PYTHON SOURCE LINES 261-262

.. code-block:: default

    tmp_dir.cleanup()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  8.256 seconds)

**Estimated memory usage:**  9 MB


.. _sphx_glr_download_auto_examples_advanced_features_caching.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: caching.py <caching.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: caching.ipynb <caching.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_