Skip to contents

This convenience function retrieves the row(s) with the highest confidence value within each time interval. It can also limit the results to a specific time interval if specified.

Usage

get_top_prediction(data, filter = NULL)

Arguments

data

A data frame with columns 'start', 'end', 'scientific_name', 'common_name', and 'confidence'. This data frame is typically the output from predictions_to_df.

filter

A list containing 'start' and 'end' values to filter the data before calculation. If NULL, the function processes all time intervals.

Value

A data frame containing the rows with the highest confidence per group or for the specified interval.

Examples

# Example data
data <- data.frame(
  start = c(0, 0, 1, 1, 2, 2),
  end = c(1, 1, 2, 2, 3, 3),
  scientific_name = c(
    "Species A",
    "Species B",
    "Species A",
    "Species B",
    "Species A",
    "Species B"
  ),
  common_name = c(
    "Common A",
    "Common B",
    "Common A",
    "Common B",
    "Common A",
    "Common B"
  ),
  confidence = c(0.1, 0.2, 0.5, 0.3, 0.7, 0.8)
)
data
#>   start end scientific_name common_name confidence
#> 1     0   1       Species A    Common A        0.1
#> 2     0   1       Species B    Common B        0.2
#> 3     1   2       Species A    Common A        0.5
#> 4     1   2       Species B    Common B        0.3
#> 5     2   3       Species A    Common A        0.7
#> 6     2   3       Species B    Common B        0.8

# Get top prediction for each time interval
get_top_prediction(data)
#>   start end scientific_name common_name confidence
#> 1     0   1       Species B    Common B        0.2
#> 2     1   2       Species A    Common A        0.5
#> 3     2   3       Species B    Common B        0.8

# Get top prediction for a specific time interval
get_top_prediction(data, filter = list(start = 1, end = 2))
#>   start end scientific_name common_name confidence
#> 1     1   2       Species A    Common A        0.5

# The same thing can be done using dplyr
if (FALSE) { # \dontrun{
 data |>
    dplyr::group_by(start, end) |>
    dplyr::slice_max(order_by = confidence)
} # }