gaitmap.evaluation_utils.match_stride_lists#

gaitmap.evaluation_utils.match_stride_lists(*, stride_list_a: DataFrame | dict[Union[collections.abc.Hashable, str], pandas.core.frame.DataFrame], stride_list_b: DataFrame | dict[Union[collections.abc.Hashable, str], pandas.core.frame.DataFrame], match_cols: str | Sequence[str] = ('start', 'end'), tolerance: int | float = 0, one_to_one: bool = True, postfix_a: str = '_a', postfix_b: str = '_b') → DataFrame | dict[Union[collections.abc.Hashable, str], pandas.core.frame.DataFrame][source]#

Find matching strides in two stride lists with a certain tolerance.

This function will find matching strides in two stride lists as long as all selected columns/event of a stride and its matching stride differ by less than the selected tolerance. This can be helpful to compare the result of a segmentation or event detection to a ground truth. In case both stride lists are multi-sensor stride lists, matching will be performed between all common sensors of the stride lists. Additional sensors are simply ignored.

Matches will be found in both directions and mapping from the s_id of the left stride list to the s_id of the right stride list (and vise-versa) are returned. For a stride that has no valid match, it will be mapped to a NaN. If one_to_one is False, multiple matches for each stride can be found. This might happen, if the tolerance value is set very high or strides in the stride lists overlap. If one_to_one is True (the default) only a single match will be returned per stride. This will be the match with the lowest combined difference over all the selected columns/events. In case multiple strides have the same combined difference, the one that occurs first in the list is chosen. This might still lead to unexpected results in certain cases. It is highly recommended to order the stride lists and remove strides with large overlaps before applying this method to get reliable results.

Parameters:

stride_list_a: The first stride list used for comparison
stride_list_b: The second stride list used for comparison
match_cols: A string or a list of strings that describes what you want to match. Default is [“start”, “end”].
tolerance: The allowed tolerance between labels. Its unit depends on the units used in the stride lists.
one_to_one: If True, only a single unique match will be returned per stride. If False, multiple matches are possible.
postfix_a: A postfix that will be append to the index name of the left stride list in the output.
postfix_b: A postfix that will be append to the index name of the left stride list in the output.

Returns:

matches: A 2 column dataframe with the column names s_id{postfix_a} and s_id{postfix_b}. Each row is a match containing the index value of the left and the right list, that belong together. Strides that do not have a match will be mapped to a NaN. The list is sorted by the index values of the left stride list. In case MultiSensorStrideLists were used as inputs, a dictionary of such values are returned.

See also

evaluate_segmented_stride_list: Find True positive, True negatives and False positives from comparing two stride lists.

Examples

Single Sensor:

>>> stride_list_left = pd.DataFrame([[10, 20], [21, 30], [31, 40], [50, 60]], columns=["start", "end"]).rename_axis(
...     "s_id"
... )
>>> stride_list_right = pd.DataFrame([[10, 21], [20, 34], [31, 40]], columns=["start", "end"]).rename_axis("s_id")
>>> match_stride_lists(
...     stride_list_a=stride_list_left,
...     stride_list_b=stride_list_right,
...     tolerance=2,
...     postfix_a="_left",
...     postfix_b="_right",
... )
  s_id_left s_id_right
0         0          0
1         1        NaN
2         2          2
3         3        NaN
4       NaN          1

Multi Sensor:

>>> stride_list_left_11 = pd.DataFrame(
...     [[10, 20], [21, 30], [31, 40], [50, 60]], columns=["start", "end"]
... ).rename_axis("s_id")
>>> stride_list_right_12 = pd.DataFrame([[10, 21], [20, 34], [31, 40]], columns=["start", "end"]).rename_axis(
...     "s_id"
... )
>>> stride_list_left_21 = pd.DataFrame(
...     [[10, 20], [31, 41], [21, 31], [50, 60]], columns=["start", "end"]
... ).rename_axis("s_id")
>>> stride_list_right_22 = pd.DataFrame([[10, 22], [31, 41], [20, 36]], columns=["start", "end"]).rename_axis(
...     "s_id"
... )
>>> test_output = match_stride_lists(
...     stride_list_a={"left_sensor": stride_list_left_11, "right_sensor": stride_list_right_12},
...     stride_list_b={"left_sensor": stride_list_left_21, "right_sensor": stride_list_right_22},
...     tolerance=1,
... )
>>> test_output["left_sensor"]
   s_id_a  s_id_b
0       0       0
1       1       2
2       2       1
3       3       3