Confusion Matrix Analysis

Compute a confusion matrix comparing filtered detection results against ground truth labels. Matches detections to labels using IoU with the Hungarian (optimal) algorithm, then builds a per-species confusion matrix showing correct classifications, misclassifications, false positives, and false negatives.

Usage Synopsis¶

Linux / macOSWindows (PowerShell)Windows (CMD)

python src/evaluation/confusion_matrix_analysis.py \
    --detections results \
    --labels path/to/labels.csv

python src/evaluation/confusion_matrix_analysis.py `
    --detections results `
    --labels path/to/labels.csv

python src/evaluation/confusion_matrix_analysis.py ^
    --detections results ^
    --labels path/to/labels.csv

Parameters¶

Parameter	Type / Default	Required?	Description
`--detections`	`PATH` / `results`	No	Detections CSV file or `results/` directory from `filter_and_merge_detections.py`. Default resolves to `simplified.csv` (follows `results/.active_run`).
`--labels`	`PATH` / —	Yes	Path to the ground truth labels CSV file. Must use the same column format as the detections file.
`--iou-threshold`	`FLOAT` / `0.25`	No	IoU threshold for matching a detection to a ground truth label. A detection is a True Positive only if its IoU with a matched label meets or exceeds this value.
`--use-1d-iou`	flag / off	No	Use 1D IoU (temporal overlap only) instead of the default 2D IoU (time × frequency). Faster but less accurate.
`--no-background`	flag / off	No	Omit the background class from the confusion matrix. By default, unmatched detections and unmatched labels are recorded in a `background` row/column.
`--single-cls`	flag / off	No	Collapse all species into a single class for binary bird-detection evaluation.
`--single-cls-name`	`STR` / `bird`	No	Class name to use when `--single-cls` is active.
`--output-path`	`PATH` / `results/confusion_matrix_analysis`	No	Output directory for all result files. Created automatically if it does not exist.

Parameter Deep-Dives¶

`--iou-threshold` — IoU Matching Threshold¶

Determines how strictly a detection must overlap a ground truth label to count as a True Positive. The optimal value depends on how precisely your annotations define the temporal and spectral extent of each call.

Value	Strictness	Typical use
`0.1`	Very lenient	Useful when annotations are coarse
`0.25` (default)	Moderate	Good starting point for most datasets
`0.5`	Strict	When annotations are precise and detailed

Matching Algorithm

The script always uses the Hungarian algorithm (optimal matching). This means the globally best assignment of detections to labels is found regardless of their order in the file. Results are fully reproducible.

`--use-1d-iou` vs 2D IoU (default)¶

2D IoU (default) measures overlap in both time and frequency:

\[ \text{IoU}_{2D} = \frac{t_{\text{overlap}} \cdot f_{\text{overlap}}}{t_{\text{union}} \cdot f_{\text{union}}} \]

1D IoU measures overlap in time only:

\[ \text{IoU}_{1D} = \frac{t_{\text{overlap}}}{t_{\text{union}}} \]

Which to use?

2D IoU is more accurate because it accounts for both when and at what frequency a call occurs. Use --use-1d-iou only if your frequency annotations are unreliable or absent.

`--no-background` — Background Class¶

By default, the confusion matrix includes a background class:

Background column (last column) — Ground truth labels that were not matched by any detection. These are False Negatives.
Background row (last row) — Detections that did not match any ground truth label. These are False Positives.

Use --no-background if you only want to see the species-level confusion (correct vs wrong species label), ignoring FP/FN counts.

`--single-cls` — Single-Class Mode¶

Remaps all species labels in both detections and ground truth to the same class name (default: bird). Useful for evaluating whether the model finds bird calls at all, independent of species classification accuracy.

Mutually Exclusive Interpretation

--single-cls and multi-species evaluation are conceptually mutually exclusive. When --single-cls is active, the confusion matrix reduces to a 2×2 TP/FP/TN/FN table (with or without background).

Output Files¶

All files are written to the --output-path directory.

File	Description
`confusion_matrix.csv`	Raw confusion matrix as a CSV.
`confusion_matrix_detailed.csv`	Same matrix with labeled `pred_<class>` rows and `true_<class>` columns.
`confusion_matrix_normalized.png`	Heatmap visualization showing row-normalized percentages.
`confusion_matrix_raw.png`	Heatmap visualization showing raw counts.
`metadata.txt`	Analysis parameters, file paths, species list, and detection/label counts.

Reading the Confusion Matrix¶

The matrix uses predicted class (rows) × true class (columns) convention.

\[ \begin{array}{r|cccc} \text{Pred} \setminus \text{True} & \text{amerob} & \text{herthr} & \text{yelwar} & \text{bg} \\ \hline \text{amerob} & \mathbf{45} & 2 & 1 & 3 \\ \text{herthr} & 3 & \mathbf{38} & 0 & 2 \\ \text{yelwar} & 1 & 0 & \mathbf{52} & 1 \\ \text{background} & 4 & 2 & 1 & \mathbf{0} \\ \end{array} \]

Cell	Meaning
Diagonal cell `[i, i]`	True Positives for species `i`
Off-diagonal cell `[i, j]` (i ≠ j)	Species `i` detected but actually species `j` (misclassification)
`background` column cell `[i, bg]`	Ground truth labels for species `i` that had no matching detection (False Negatives)
`background` row cell `[bg, j]`	Detections that did not match any ground truth label for species `j` (False Positives)

Examples¶

Basic confusion matrix¶

CommandExpected Output

python src/evaluation/confusion_matrix_analysis.py \
    --detections results \
    --labels data/ground_truth.csv

================================================================================
CONFUSION MATRIX ANALYSIS
================================================================================
Detections: results/simplified.csv
Labels: data/ground_truth.csv
IoU threshold: 0.25
IoU type: 2D (time-frequency)
Matching method: Optimal (Hungarian)
Include background: True

Loading detections from: results/simplified.csv
Loaded 23 detections

Loading ground truth labels from: data/ground_truth.csv
Loaded 31 ground truth labels

Found 3 unique species: amerob, herthr, yelwar
Building confusion matrix...

Strict 1D IoU matching¶

CommandExpected Output

python src/evaluation/confusion_matrix_analysis.py \
    --detections results \
    --labels data/ground_truth.csv \
    --iou-threshold 0.5 \
    --use-1d-iou \
    --output-path results/confusion_strict

IoU type: 1D (time only)
IoU threshold: 0.5
...
Saved confusion matrix results to: results/confusion_strict/

Without background class¶

CommandExpected Output

python src/evaluation/confusion_matrix_analysis.py \
    --detections results \
    --labels data/ground_truth.csv \
    --no-background \
    --output-path results/confusion_no_bg

Include background: False
...
Species-only matrix saved (no FP/FN rows/columns).

Binary bird detection¶

CommandExpected Output

python src/evaluation/confusion_matrix_analysis.py \
    --detections results \
    --labels data/ground_truth.csv \
    --single-cls \
    --single-cls-name bird \
    --output-path results/confusion_binary

Single-class mode: True (class='bird')
Found 1 unique species: bird
Building confusion matrix...

Typical Workflow¶

This script sits at Step 4 of the standard evaluation pipeline:

Step 1  detect_birds.py --conf 0.001 --no-merge    →  raw_detections.json
Step 2  f_beta_score_analysis.py                   →  optimal_thresholds.csv
Step 3  filter_and_merge_detections.py --conf 0.35 →  simplified.csv
Step 4  confusion_matrix_analysis.py               →  confusion_matrix/

Input File Format

Both --detections and --labels must use the six-column schema described in Ground-Truth Labels — CSV schema. Filenames are matched by stem only, for details see Filename matching.

Confusion Matrix Analysis

Usage Synopsis¶

Parameters¶

Parameter Deep-Dives¶

--iou-threshold — IoU Matching Threshold¶

--use-1d-iou vs 2D IoU (default)¶

--no-background — Background Class¶

--single-cls — Single-Class Mode¶

Output Files¶

Reading the Confusion Matrix¶

Examples¶

Basic confusion matrix¶

Strict 1D IoU matching¶

Without background class¶

Binary bird detection¶

Typical Workflow¶

`--iou-threshold` — IoU Matching Threshold¶

`--use-1d-iou` vs 2D IoU (default)¶

`--no-background` — Background Class¶

`--single-cls` — Single-Class Mode¶