Skip to content

DetectionFilter

Filter raw bird call detections by confidence threshold and merge them into song segments, without re-running inference. The post-processing companion to BirdCallDetector(..., no_merge=True), letting you test different thresholds instantly on the same raw detection file.


Import

from evaluation.filter_and_merge_detections import DetectionFilter

DetectionFilter

Constructor

DetectionFilter(use_max_confidence=True)
Parameter Type / Default Required? Description
use_max_confidence bool / True No Kept for API compatibility with FBetaScoreAnalyzer. Has no effect on raw-detection filtering, which always uses the per-clip confidence field.

Methods

Method Returns Description
load_detections(input_path) Dict Load raw_detections.json.
filter_detections(data, conf_threshold, song_gap) List[Dict] Filter by confidence then merge. Primary entry point.
save_results(data, filtered_detections, output_path, conf_threshold, song_gap, output_formats) None Write filtered results in one or more formats.
save_filtered_json(data, filtered_detections, output_path, conf_threshold, song_gap) None Write JSON with algorithm metadata.
save_filtered_csv(filtered_detections, output_path) None Write simplified CSV.
save_filtered_xc_json(data, filtered_detections, output_path) None Write Xeno-Canto Annota-JSON.
save_filtered_raven_txt(filtered_detections, output_path) None Write Raven Selection Table.

load_detections

data = df.load_detections(input_path)

Load a raw_detections.json file produced by BirdCallDetector with no_merge=True.

Parameter Type / Default Required? Description
input_path str / — Yes Path to the raw detections JSON file or a results/ directory. When a directory is passed, raw_detections.json is resolved automatically (follows results/.active_run).

Returns: Dict with keys detections (list of raw clip dicts), model_config, and metadata fields.


filter_detections

merged = df.filter_detections(data, conf_threshold, song_gap=None)

Apply a confidence threshold to raw detections, then merge surviving clips into song segments using reconstruct_songs. This is the same filter-then-merge order used by BirdCallDetector internally.

Parameter Type / Default Required? Description
data Dict / — Yes Dict returned by load_detections().
conf_threshold float / — Yes Confidence threshold (0.0–1.0). All detections with confidence < threshold are discarded before merging.
song_gap float / None No Max gap in seconds between two detections of the same species to merge into one song. When None, reads model_config.song_gap_threshold from the JSON, falling back to 0.1.

Returns: List[Dict] — merged song segments.

Why filter raw clips and not merged songs?

The filter-then-merge order matches the app's internal workflow. Filtering already-merged songs by average or max confidence gives different (and generally worse) results.


save_results

df.save_results(
    data,
    filtered_detections,
    output_path,
    conf_threshold,
    song_gap,
    output_formats=None,
)

Write filtered detections to one or more output formats under output_path.

Parameter Type / Default Required? Description
data Dict / — Yes Original data dict from load_detections(). Used to preserve model_config in JSON output.
filtered_detections List[Dict] / — Yes Merged detections returned by filter_detections().
output_path str / — Yes Output directory. Created automatically if it does not exist.
conf_threshold float / — Yes Threshold applied (recorded in output metadata).
song_gap float / — Yes Song gap used (recorded in output metadata).
output_formats List[str] / ['json-with-algorithm-metadata', 'simplified-csv'] No One or more format keys: json-with-algorithm-metadata, simplified-csv, xeno-canto-annota-json, raven-selection-table, or all.

Parameter Deep-Dives

conf_threshold — Confidence Threshold

The threshold is applied to raw per-clip detections before song merging. It is directly comparable to the conf_threshold you would pass to BirdCallDetector.

song_gap — Song Gap

Controls how aggressively surviving detections are fused into continuous song segments. Omitting it reuses the song_gap_threshold from the JSON's model_config, ensuring consistency with the original detection run.

song_gap Effect
None (recommended) Reads from JSON model_config.song_gap_threshold, falls back to 0.1
0.1 Merge detections ≤ 0.1 s apart
0.5 Merge detections ≤ 0.5 s apart (moderate)
2.0 Merge detections ≤ 2 s apart (aggressive, may over-merge)

Output Files

Files written by save_results() to output_path:

output_path/with_algorithm_metadata.json   ← json-with-algorithm-metadata
output_path/simplified.csv                 ← simplified-csv
output_path/xeno-canto-annota.json         ← xeno-canto-annota-json
output_path/raven_selection_table.txt      ← raven-selection-table

For the schema of each format see Detection Output Formats.


Examples

Basic filter and merge

from evaluation.filter_and_merge_detections import DetectionFilter

df = DetectionFilter()
data = df.load_detections("results/raw_detections.json")

merged = df.filter_detections(data, conf_threshold=0.35)
print(f"Merged: {len(merged)} song segments")

df.save_results(
    data,
    merged,
    output_path="results/filtered_0.35",
    conf_threshold=0.35,
    song_gap=0.1,
)

All formats

from evaluation.filter_and_merge_detections import DetectionFilter

df = DetectionFilter()
data = df.load_detections("results/raw_detections.json")
merged = df.filter_detections(data, conf_threshold=0.25)

df.save_results(
    data,
    merged,
    output_path="results/filtered_0.25",
    conf_threshold=0.25,
    song_gap=0.1,
    output_formats=["all"],
)

Rapid threshold comparison

from evaluation.filter_and_merge_detections import DetectionFilter

df = DetectionFilter()
data = df.load_detections("results/raw_detections.json")

for threshold in [0.20, 0.30, 0.40, 0.50]:
    merged = df.filter_detections(data, conf_threshold=threshold)
    print(f"conf={threshold:.2f}{len(merged)} segments")

Apply threshold from F-beta analysis

import pandas as pd
from evaluation.filter_and_merge_detections import DetectionFilter

optimal = pd.read_csv("results/f_beta_score_analysis/optimal_thresholds.csv")
best_threshold = optimal[optimal["species"] == "Overall_Micro"]["optimal_threshold"].iloc[0]

df = DetectionFilter()
data = df.load_detections("results/raw_detections.json")
merged = df.filter_detections(data, conf_threshold=best_threshold)

df.save_results(
    data,
    merged,
    output_path="results/final",
    conf_threshold=best_threshold,
    song_gap=data["model_config"]["song_gap_threshold"],
    output_formats=["simplified-csv", "json-with-algorithm-metadata"],
)

Typical Workflow

This class sits at Step 3 of the standard evaluation pipeline:

Step 1  BirdCallDetector(conf_threshold=0.001).detect(..., no_merge=True)  →  raw_detections.json
Step 2  FBetaScoreAnalyzer().analyze_confidence_thresholds(...)             →  optimal_thresholds.csv
Step 3  DetectionFilter().filter_detections(data, conf=0.35)               →  simplified.csv
Step 4  ConfusionMatrixAnalyzer().analyze(...)                              →  confusion_matrix/

When to Use This Class

  • After F-beta analysis: Apply the optimal threshold without re-running inference.
  • For rapid threshold experiments: Test multiple thresholds on the same raw file in seconds.
  • To generate CSV files: Produce CSV files compatible with ConfusionMatrixAnalyzer or external tools.
  • For custom thresholds: Apply any threshold outside your original sweep range.