Confusion Matrix Analysis
Compute a confusion matrix comparing filtered detection results against ground truth labels. Matches detections to labels using IoU with the Hungarian (optimal) algorithm, then builds a per-species confusion matrix showing correct classifications, misclassifications, false positives, and false negatives.
Usage Synopsis¶
Parameters¶
| Parameter | Type / Default | Required? | Description |
|---|---|---|---|
--detections |
PATH / results |
No | Detections CSV file or results/ directory from filter_and_merge_detections.py. Default resolves to simplified.csv (follows results/.active_run). |
--labels |
PATH / — |
Yes | Path to the ground truth labels CSV file. Must use the same column format as the detections file. |
--iou-threshold |
FLOAT / 0.25 |
No | IoU threshold for matching a detection to a ground truth label. A detection is a True Positive only if its IoU with a matched label meets or exceeds this value. |
--use-1d-iou |
flag / off | No | Use 1D IoU (temporal overlap only) instead of the default 2D IoU (time × frequency). Faster but less accurate. |
--no-background |
flag / off | No | Omit the background class from the confusion matrix. By default, unmatched detections and unmatched labels are recorded in a background row/column. |
--single-cls |
flag / off | No | Collapse all species into a single class for binary bird-detection evaluation. |
--single-cls-name |
STR / bird |
No | Class name to use when --single-cls is active. |
--output-path |
PATH / results/confusion_matrix_analysis |
No | Output directory for all result files. Created automatically if it does not exist. |
Parameter Deep-Dives¶
--iou-threshold — IoU Matching Threshold¶
Determines how strictly a detection must overlap a ground truth label to count as a True Positive. The optimal value depends on how precisely your annotations define the temporal and spectral extent of each call.
| Value | Strictness | Typical use |
|---|---|---|
0.1 |
Very lenient | Useful when annotations are coarse |
0.25 (default) |
Moderate | Good starting point for most datasets |
0.5 |
Strict | When annotations are precise and detailed |
Matching Algorithm
The script always uses the Hungarian algorithm (optimal matching). This means the globally best assignment of detections to labels is found regardless of their order in the file. Results are fully reproducible.
--use-1d-iou vs 2D IoU (default)¶
2D IoU (default) measures overlap in both time and frequency:
1D IoU measures overlap in time only:
Which to use?
2D IoU is more accurate because it accounts for both when and at what frequency a call occurs. Use --use-1d-iou only if your frequency annotations are unreliable or absent.
--no-background — Background Class¶
By default, the confusion matrix includes a background class:
- Background column (last column) — Ground truth labels that were not matched by any detection. These are False Negatives.
- Background row (last row) — Detections that did not match any ground truth label. These are False Positives.
Use --no-background if you only want to see the species-level confusion (correct vs wrong species label), ignoring FP/FN counts.
--single-cls — Single-Class Mode¶
Remaps all species labels in both detections and ground truth to the same class name (default: bird). Useful for evaluating whether the model finds bird calls at all, independent of species classification accuracy.
Mutually Exclusive Interpretation
--single-cls and multi-species evaluation are conceptually mutually exclusive. When --single-cls is active, the confusion matrix reduces to a 2×2 TP/FP/TN/FN table (with or without background).
Output Files¶
All files are written to the --output-path directory.
| File | Description |
|---|---|
confusion_matrix.csv |
Raw confusion matrix as a CSV. |
confusion_matrix_detailed.csv |
Same matrix with labeled pred_<class> rows and true_<class> columns. |
confusion_matrix_normalized.png |
Heatmap visualization showing row-normalized percentages. |
confusion_matrix_raw.png |
Heatmap visualization showing raw counts. |
metadata.txt |
Analysis parameters, file paths, species list, and detection/label counts. |
Reading the Confusion Matrix¶
The matrix uses predicted class (rows) × true class (columns) convention.
| Cell | Meaning |
|---|---|
Diagonal cell [i, i] |
True Positives for species i |
Off-diagonal cell [i, j] (i ≠ j) |
Species i detected but actually species j (misclassification) |
background column cell [i, bg] |
Ground truth labels for species i that had no matching detection (False Negatives) |
background row cell [bg, j] |
Detections that did not match any ground truth label for species j (False Positives) |
Examples¶
Basic confusion matrix¶
================================================================================
CONFUSION MATRIX ANALYSIS
================================================================================
Detections: results/simplified.csv
Labels: data/ground_truth.csv
IoU threshold: 0.25
IoU type: 2D (time-frequency)
Matching method: Optimal (Hungarian)
Include background: True
Loading detections from: results/simplified.csv
Loaded 23 detections
Loading ground truth labels from: data/ground_truth.csv
Loaded 31 ground truth labels
Found 3 unique species: amerob, herthr, yelwar
Building confusion matrix...
Strict 1D IoU matching¶
Without background class¶
Binary bird detection¶
Typical Workflow¶
This script sits at Step 4 of the standard evaluation pipeline:
Step 1 detect_birds.py --conf 0.001 --no-merge → raw_detections.json
Step 2 f_beta_score_analysis.py → optimal_thresholds.csv
Step 3 filter_and_merge_detections.py --conf 0.35 → simplified.csv
Step 4 confusion_matrix_analysis.py → confusion_matrix/
Input File Format
Both --detections and --labels must use the six-column schema described in Ground-Truth Labels — CSV schema. Filenames are matched by stem only, for details see Filename matching.