.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/datasets_and_pipelines/cross_validation.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_datasets_and_pipelines_cross_validation.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_datasets_and_pipelines_cross_validation.py:


.. _cross_validation:

Cross Validation
================

.. note:: These examples are basically copies from the same examples in tpcp, but using gait algorithms!
          These examples are less often updated than the official tpcp examples.
          Hence, it makes sense to cross-check the official examples.

Whenever using some sort of trainable algorithm it is important to clearly separate the training and the testing data to
get an unbiased result.
Usually this is achieved by a train-test split.
However, if you don't have that much data, there is always a risk that one random train-test split, will provide
better (or worse) results than another.
In this cases it is a good idea to use cross-validation.
In this procedure, you perform multiple train-test splits and average the results over all "folds".
For more information see our :ref:`evaluation guide <algorithm_evaluation>` and the `sklearn guide on cross
validation <https://scikit-learn.org/stable/modules/cross_validation.html>`.

In this example, we will learn how to use the :func:`~tpcp.optimize.cross_validate` function implemented in
gaitmap.
For this, we will redo the example on :ref:`optimizable pipelines <optimize_pipelines>` but we will perform the final
evaluation via cross-validation.
If you want to have more information on how the dataset and pipeline is built, head over to this example.
Here we will just copy the code over.

.. GENERATED FROM PYTHON SOURCE LINES 28-120

.. code-block:: default


    import numpy as np
    import pandas as pd
    from tpcp import CloneFactory, Dataset, OptimizableParameter, OptimizablePipeline, Parameter

    from gaitmap.data_transform import TrainableAbsMaxScaler
    from gaitmap.example_data import get_healthy_example_imu_data, get_healthy_example_stride_borders
    from gaitmap.stride_segmentation import (
        BarthDtw,
        BarthOriginalTemplate,
        BaseDtwTemplate,
        InterpolatedDtwTemplate,
        TrainableTemplateMixin,
    )
    from gaitmap.utils.array_handling import iterate_region_data
    from gaitmap.utils.coordinate_conversion import convert_left_foot_to_fbf, convert_right_foot_to_fbf
    from gaitmap.utils.datatype_helper import SingleSensorStrideList


    class MyDataset(Dataset):
        @property
        def sampling_rate_hz(self) -> float:
            return 204.8

        @property
        def data(self):
            self.assert_is_single(None, "data")
            return get_healthy_example_imu_data()[self.index.iloc[0]["foot"] + "_sensor"]

        @property
        def segmented_stride_list_(self):
            self.assert_is_single(None, "data")
            return get_healthy_example_stride_borders()[self.index.iloc[0]["foot"] + "_sensor"].set_index("s_id")

        def create_index(self) -> pd.DataFrame:
            return pd.DataFrame({"participant": ["test", "test"], "foot": ["left", "right"]})


    class MyPipeline(OptimizablePipeline):
        max_cost: Parameter[float]
        template: OptimizableParameter[BaseDtwTemplate]

        segmented_stride_list_: SingleSensorStrideList
        cost_func_: np.ndarray

        # We need to wrap the template in a `CloneFactory` call here to prevent issues with mutable defaults!
        def __init__(self, max_cost: float = 3, template: BaseDtwTemplate = CloneFactory(BarthOriginalTemplate())) -> None:
            self.max_cost = max_cost
            self.template = template

        def self_optimize(self, dataset: MyDataset, **kwargs):
            if not isinstance(self.template, TrainableTemplateMixin):
                raise ValueError(
                    "The template must be optimizable! If you are using a fixed template (e.g. "
                    "BarthOriginalTemplate), switch to an optimizable base classe."
                )
            # Our training consists of cutting all strides from the dataset and then creating a new template from all
            # strides in the dataset

            # We expect multiple datapoints in the dataset
            sampling_rate = dataset[0].sampling_rate_hz
            # We create a generator for the data and the stride labels
            data_sequences = (
                self._convert_cord_system(datapoint.data, datapoint.groups[0][1]).filter(like="gyr")
                for datapoint in dataset
            )
            stride_labels = (datapoint.segmented_stride_list_ for datapoint in dataset)

            stride_generator = iterate_region_data(data_sequences, stride_labels)

            self.template.self_optimize(
                stride_generator, columns=["gyr_pa", "gyr_ml", "gyr_si"], sampling_rate_hz=sampling_rate
            )
            return self

        def _convert_cord_system(self, data, foot):
            converter = {"left": convert_left_foot_to_fbf, "right": convert_right_foot_to_fbf}
            return converter[foot](data)

        def run(self, datapoint: MyDataset):
            # `datapoint.groups[0]` gives us the identifier of the datapoint (e.g. `("test", "left")`).
            # And `datapoint.groups[0][1]` is the foot.
            data = self._convert_cord_system(datapoint.data, datapoint.groups[0][1])

            dtw = BarthDtw(max_cost=self.max_cost, template=self.template)
            dtw.segment(data, datapoint.sampling_rate_hz)

            self.segmented_stride_list_ = dtw.stride_list_
            self.cost_func_ = dtw.cost_function_
            return self


.. GENERATED FROM PYTHON SOURCE LINES 121-127

The Scorer
----------
When using cross validation, we usually want to calculate performance parameters for each fold, so that we can
calculate the average performance as our expected "generalization" error.
For this example, we will use the "precision", the "recall" and the "f1_score" to score the stride detection
performance.

.. GENERATED FROM PYTHON SOURCE LINES 127-138

.. code-block:: default

    from gaitmap.evaluation_utils import evaluate_segmented_stride_list, precision_recall_f1_score


    def score(pipeline: MyPipeline, datapoint: MyDataset):
        pipeline.safe_run(datapoint)
        matches_df = evaluate_segmented_stride_list(
            ground_truth=datapoint.segmented_stride_list_, segmented_stride_list=pipeline.segmented_stride_list_
        )
        return precision_recall_f1_score(matches_df)


.. GENERATED FROM PYTHON SOURCE LINES 139-147

Data Splitting
--------------
Before performing a cross validation, we need to decide on the number of folds and type of splits.
In gaitmap we support all cross validation iterators provided in `sklearn
<https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation-iterators>`.

In this example we only have two datapoints.
This means, we can only use a 2-fold cross-validation:

.. GENERATED FROM PYTHON SOURCE LINES 147-151

.. code-block:: default

    from sklearn.model_selection import KFold

    cv = KFold(n_splits=2)


.. GENERATED FROM PYTHON SOURCE LINES 152-158

Cross Validation
----------------
Now we have all the pieces for the final cross validation.
First we need to create instances of our data and pipeline.
Then we need to wrap our pipeline instance into an :class:`~tpcp.Optimize` wrapper.
Finally, we can call `cross_validate`.

.. GENERATED FROM PYTHON SOURCE LINES 158-169

.. code-block:: default

    from tpcp.optimize import Optimize
    from tpcp.validate import cross_validate

    ds = MyDataset()
    pipe = MyPipeline(template=InterpolatedDtwTemplate(scaling=TrainableAbsMaxScaler()))
    optimizable_pipe = Optimize(pipe)

    results = cross_validate(optimizable_pipe, ds, scoring=score, cv=cv, return_optimizer=True, return_train_score=True)
    result_df = pd.DataFrame(results)
    result_df


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    CV Folds:   0%|          | 0/2 [00:00<?, ?it/s]
    Datapoints:   0%|          | 0/1 [00:00<?, ?it/s]    Datapoints: 100%|##########| 1/1 [00:00<00:00, 11.28it/s]

    Datapoints:   0%|          | 0/1 [00:00<?, ?it/s]    Datapoints: 100%|##########| 1/1 [00:00<00:00, 12.34it/s]
    CV Folds:  50%|#####     | 1/2 [00:00<00:00,  3.53it/s]
    Datapoints:   0%|          | 0/1 [00:00<?, ?it/s]    Datapoints: 100%|##########| 1/1 [00:00<00:00, 14.09it/s]

    Datapoints:   0%|          | 0/1 [00:00<?, ?it/s]    Datapoints: 100%|##########| 1/1 [00:00<00:00, 13.77it/s]
    CV Folds: 100%|##########| 2/2 [00:00<00:00,  3.90it/s]    CV Folds: 100%|##########| 2/2 [00:00<00:00,  3.84it/s]


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>score_time</th>
          <th>optimize_time</th>
          <th>train_data_labels</th>
          <th>test_data_labels</th>
          <th>optimizer</th>
          <th>test_precision</th>
          <th>test_recall</th>
          <th>test_f1_score</th>
          <th>test_single_precision</th>
          <th>test_single_recall</th>
          <th>test_single_f1_score</th>
          <th>train_precision</th>
          <th>train_recall</th>
          <th>train_f1_score</th>
          <th>train_single_precision</th>
          <th>train_single_recall</th>
          <th>train_single_f1_score</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>0.099674</td>
          <td>0.081988</td>
          <td>[(test, right)]</td>
          <td>[(test, left)]</td>
          <td>Optimize(optimize_with_info=True, pipeline=MyP...</td>
          <td>1.0</td>
          <td>1.000000</td>
          <td>1.000000</td>
          <td>[1.0]</td>
          <td>[1.0]</td>
          <td>[1.0]</td>
          <td>1.0</td>
          <td>0.9</td>
          <td>0.947368</td>
          <td>[1.0]</td>
          <td>[0.9]</td>
          <td>[0.9473684210526316]</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.081657</td>
          <td>0.063876</td>
          <td>[(test, left)]</td>
          <td>[(test, right)]</td>
          <td>Optimize(optimize_with_info=True, pipeline=MyP...</td>
          <td>1.0</td>
          <td>0.866667</td>
          <td>0.928571</td>
          <td>[1.0]</td>
          <td>[0.8666666666666667]</td>
          <td>[0.9285714285714286]</td>
          <td>1.0</td>
          <td>1.0</td>
          <td>1.000000</td>
          <td>[1.0]</td>
          <td>[1.0]</td>
          <td>[1.0]</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 170-177

Understanding the Results
-------------------------
The cross validation provides a lot of outputs (some of them can be disabled using the function parameters).
To simplify things a little, we will split the output into four parts:

The main output are the test set performance values.
Each row corresponds to performance in respective fold.

.. GENERATED FROM PYTHON SOURCE LINES 177-180

.. code-block:: default

    performance = result_df[["test_precision", "test_recall", "test_f1_score"]]
    performance


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>test_precision</th>
          <th>test_recall</th>
          <th>test_f1_score</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>1.0</td>
          <td>1.000000</td>
          <td>1.000000</td>
        </tr>
        <tr>
          <th>1</th>
          <td>1.0</td>
          <td>0.866667</td>
          <td>0.928571</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 181-184

The final generalization performance you would report is usually the average over all folds.
The STD can also be interesting, as it tells you how stable your optimization is and if your splits provide
comparable data distributions.

.. GENERATED FROM PYTHON SOURCE LINES 184-187

.. code-block:: default

    generalization_performance = performance.agg(["mean", "std"])
    generalization_performance


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>test_precision</th>
          <th>test_recall</th>
          <th>test_f1_score</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>mean</th>
          <td>1.0</td>
          <td>0.933333</td>
          <td>0.964286</td>
        </tr>
        <tr>
          <th>std</th>
          <td>0.0</td>
          <td>0.094281</td>
          <td>0.050508</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 188-194

If you need more insight into the results (e.g. when the std of your results is high), you can inspect the
individual score for each data point.
In this example this is only a list with a single element per score, as we only had a single datapoint per fold.
In a real scenario, this will be a list of all datapoints.
Inspecting this list can help to identify potential issues with certain parts of your dataset.
To link the performance values to a specific datapoint, you can look at the `test_data_labels` field.

.. GENERATED FROM PYTHON SOURCE LINES 194-199

.. code-block:: default

    single_performance = result_df[
        ["test_single_precision", "test_single_recall", "test_single_f1_score", "test_data_labels"]
    ]
    single_performance


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>test_single_precision</th>
          <th>test_single_recall</th>
          <th>test_single_f1_score</th>
          <th>test_data_labels</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>[1.0]</td>
          <td>[1.0]</td>
          <td>[1.0]</td>
          <td>[(test, left)]</td>
        </tr>
        <tr>
          <th>1</th>
          <td>[1.0]</td>
          <td>[0.8666666666666667]</td>
          <td>[0.9285714285714286]</td>
          <td>[(test, right)]</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 200-204

Even further insight is provided by the train results (if activated in parameters).
These are the performance results on the train set and can indicate if the training provided meaningful results and
can also indicate over-fitting, if the performance of the test set is much worse than the performance on the train
set.

.. GENERATED FROM PYTHON SOURCE LINES 204-217

.. code-block:: default

    train_performance = result_df[
        [
            "train_precision",
            "train_recall",
            "train_f1_score",
            "train_single_precision",
            "train_single_recall",
            "train_single_f1_score",
            "train_data_labels",
        ]
    ]
    train_performance


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>train_precision</th>
          <th>train_recall</th>
          <th>train_f1_score</th>
          <th>train_single_precision</th>
          <th>train_single_recall</th>
          <th>train_single_f1_score</th>
          <th>train_data_labels</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>1.0</td>
          <td>0.9</td>
          <td>0.947368</td>
          <td>[1.0]</td>
          <td>[0.9]</td>
          <td>[0.9473684210526316]</td>
          <td>[(test, right)]</td>
        </tr>
        <tr>
          <th>1</th>
          <td>1.0</td>
          <td>1.0</td>
          <td>1.000000</td>
          <td>[1.0]</td>
          <td>[1.0]</td>
          <td>[1.0]</td>
          <td>[(test, left)]</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 218-220

The final level of debug information is provided via the timings (note the long runtime in fold 0 can be explained
by the jit-compiler used in `BarthDtw`) ...

.. GENERATED FROM PYTHON SOURCE LINES 220-223

.. code-block:: default

    timings = result_df[["score_time", "optimize_time"]]
    timings


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>score_time</th>
          <th>optimize_time</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>0.099674</td>
          <td>0.081988</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.081657</td>
          <td>0.063876</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 224-228

... and the optimized pipeline object.
This is the actual trained object generated in this fold.
You can apply it to other data for testing or inspect the actual object for further debug information that might be
stored on it.

.. GENERATED FROM PYTHON SOURCE LINES 228-231

.. code-block:: default

    optimized_pipeline = result_df["optimizer"][0]
    optimized_pipeline


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Optimize(optimize_with_info=True, pipeline=MyPipeline(max_cost=3, template=InterpolatedDtwTemplate(data=None, interpolation_method='linear', n_samples=None, sampling_rate_hz=None, scaling=TrainableAbsMaxScaler(data_max=None, out_max=1), use_cols=None)), safe_optimize=True)


.. GENERATED FROM PYTHON SOURCE LINES 232-234

.. code-block:: default

    optimized_pipeline.optimized_pipeline_.get_params()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    {'max_cost': 3, 'template__data':          gyr_pa      gyr_ml      gyr_si
    0   -254.320257 -504.602792  128.012638
    1   -226.083031 -425.141517   30.978740
    2   -206.995299 -283.505209   21.408405
    3   -120.319427 -144.344856  -49.580293
    4    -14.847980  -29.578985  -84.786314
    ..          ...         ...         ...
    222 -254.506784 -411.835243  140.743087
    223 -255.752408 -414.280737  149.520945
    224 -251.221218 -420.958954  150.130405
    225 -249.596548 -432.610208  146.034873
    226 -255.449229 -456.521262  140.199006

    [227 rows x 3 columns], 'template__interpolation_method': 'linear', 'template__n_samples': None, 'template__sampling_rate_hz': 204.8, 'template__scaling__data_max': 504.60279162067746, 'template__scaling__out_max': 1, 'template__scaling': TrainableAbsMaxScaler(data_max=504.60279162067746, out_max=1), 'template__use_cols': None, 'template': InterpolatedDtwTemplate(data=         gyr_pa      gyr_ml      gyr_si
    0   -254.320257 -504.602792  128.012638
    1   -226.083031 -425.141517   30.978740
    2   -206.995299 -283.505209   21.408405
    3   -120.319427 -144.344856  -49.580293
    4    -14.847980  -29.578985  -84.786314
    ..          ...         ...         ...
    222 -254.506784 -411.835243  140.743087
    223 -255.752408 -414.280737  149.520945
    224 -251.221218 -420.958954  150.130405
    225 -249.596548 -432.610208  146.034873
    226 -255.449229 -456.521262  140.199006

    [227 rows x 3 columns], interpolation_method='linear', n_samples=None, sampling_rate_hz=204.8, scaling=TrainableAbsMaxScaler(data_max=504.60279162067746, out_max=1), use_cols=None)}


.. GENERATED FROM PYTHON SOURCE LINES 235-243

Further Notes
-------------
We also support grouped cross validation.
Check the :ref:`dataset guide <custom_dataset>` on how you can group the data before cross-validaiton or generate
data labels to be used with `GroupedKFold`.

`Optimize` is just an example of an optimizer that can be passed to cross validation.
You can pass any gaitmap optimizer like `GridSearch` or `GridSearchCV`.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  2.403 seconds)

**Estimated memory usage:**  11 MB


.. _sphx_glr_download_auto_examples_datasets_and_pipelines_cross_validation.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: cross_validation.py <cross_validation.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: cross_validation.ipynb <cross_validation.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_