Skip to content

ConfusionMatrixAnalyzer

Compute a confusion matrix comparing filtered detection results against ground truth labels. Matches detections to labels using IoU with the Hungarian (optimal) algorithm, then builds a per-species confusion matrix showing correct classifications, misclassifications, false positives, and false negatives.


Import

from evaluation.confusion_matrix_analysis import ConfusionMatrixAnalyzer

ConfusionMatrixAnalyzer

Constructor

ConfusionMatrixAnalyzer(
    iou_threshold=0.25,
    use_2d_iou=True,
    include_background=True,
    single_cls=False,
    single_cls_name="bird",
)
Parameter Type / Default Required? Description
iou_threshold float / 0.25 No IoU threshold for matching a detection to a ground truth label. Use the same value as in FBetaScoreAnalyzer for consistent evaluation.
use_2d_iou bool / True No If True, uses 2D IoU (time × frequency). If False, uses 1D IoU (temporal overlap only). 2D IoU is more accurate. Use 1D only when frequency annotations are unreliable or absent.
include_background bool / True No If True, adds a background class for unmatched detections (FP) and unmatched labels (FN). Pass False to see only species-vs-species misclassifications.
single_cls bool / False No Collapse all species into one class for binary bird-detection evaluation. Reduces the matrix to 2×2.
single_cls_name str / 'bird' No Class name to use when single_cls=True.

Mutually Exclusive Interpretation

single_cls=True and multi-species evaluation are conceptually mutually exclusive. When active, the confusion matrix reduces to a 2×2 TP/FP/TN/FN table (with or without background).


Methods

Method Returns Description
analyze(detections_path, labels_path, output_path) Tuple[np.ndarray, List[str]] Run the full analysis. Primary entry point.
load_detections_csv(detections_path) List[Dict] Load merged detections CSV.
load_labels_csv(labels_path) List[Dict] Load ground truth labels CSV.
compute_confusion_matrix(detections, labels) Tuple[np.ndarray, List[str]] Build the matrix from loaded data.
save_results(confusion_matrix, species_list, output_path, detections_path, labels_path) None Write CSV, images, and metadata.

analyze

matrix, species = analyzer.analyze(
    detections_path,
    labels_path,
    output_path=None,
)

Run the complete confusion matrix pipeline: load, match, compute, optionally save.

Parameter Type / Default Required? Description
detections_path str / — Yes Path to simplified.csv from DetectionFilter, or a results/ directory (resolves simplified.csv via .active_run).
labels_path str / — Yes Path to the ground truth labels CSV. Must use the same six-column schema as the detections file.
output_path str / None No Directory for output files. When provided, writes the confusion matrix CSV, heatmap images, and metadata.

Returns: (np.ndarray, List[str]) — the confusion matrix and ordered species list.


load_detections_csv

detections = analyzer.load_detections_csv(detections_path)

Load a simplified.csv produced by DetectionFilter. Returns a list of dicts with keys filename, time_start, time_end, freq_low_hz, freq_high_hz, species.


load_labels_csv

labels = analyzer.load_labels_csv(labels_path)

Load a ground truth labels CSV. Returns a list of dicts with the same schema as load_detections_csv().


compute_confusion_matrix

matrix, species_list = analyzer.compute_confusion_matrix(detections, labels)

Build the confusion matrix from pre-loaded data. The Hungarian algorithm assigns detections to labels globally at IoU ≥ iou_threshold.

Parameter Type / Default Required? Description
detections List[Dict] / — Yes Detections from load_detections_csv().
labels List[Dict] / — Yes Labels from load_labels_csv().

Returns: (np.ndarray, List[str]) — confusion matrix and species list. Matrix shape is (N+1, N+1) when include_background=True (last row/column is background).


Parameter Deep-Dives

iou_threshold — IoU Matching

Determines how strictly a detection must overlap a ground truth label to count as a True Positive.

Value Strictness Typical use
0.1 Very lenient When annotations are coarse
0.25 (default) Moderate Good starting point for most datasets
0.5 Strict When annotations are precise and detailed

2D vs 1D IoU

2D IoU (default) measures overlap in both time and frequency:

\[\text{IoU}_{2D} = \frac{t_{\text{overlap}} \cdot f_{\text{overlap}}}{t_{\text{union}} \cdot f_{\text{union}}}\]

1D IoU (use_2d_iou=False) measures temporal overlap only:

\[\text{IoU}_{1D} = \frac{t_{\text{overlap}}}{t_{\text{union}}}\]

Use use_2d_iou=False only when frequency annotations are unreliable or absent.

include_background — Background Class

When include_background=True:

  • Background column — ground truth labels not matched by any detection. These are False Negatives.
  • Background row — detections that did not match any ground truth label. These are False Positives.

Set include_background=False to see only the species-level confusion (correct vs wrong species label), ignoring FP/FN counts.


Output Files

All files are written to the output_path directory.

File Description
confusion_matrix.csv Raw confusion matrix as CSV.
confusion_matrix_detailed.csv Same matrix with pred_<class> rows and true_<class> columns.
confusion_matrix_normalized.png Heatmap showing row-normalized percentages.
confusion_matrix_raw.png Heatmap showing raw counts.
metadata.txt Analysis parameters, file paths, species list, detection and label counts.

Reading the Confusion Matrix

The matrix uses predicted class (rows) × true class (columns) convention.

Cell Meaning
Diagonal cell [i, i] True Positives for species i
Off-diagonal [i, j] (i ≠ j) Predicted species i, actually species j (misclassification)
background column [i, bg] Labels for species i with no matching detection (False Negatives)
background row [bg, j] Detections that matched no label for species j (False Positives)

Examples

Basic confusion matrix

from evaluation.confusion_matrix_analysis import ConfusionMatrixAnalyzer

analyzer = ConfusionMatrixAnalyzer(iou_threshold=0.25)

matrix, species = analyzer.analyze(
    detections_path="results/simplified.csv",
    labels_path="data/ground_truth.csv",
    output_path="results/confusion_matrix",
)

print(f"Species: {species}")
print(matrix)

Strict 2D IoU matching

from evaluation.confusion_matrix_analysis import ConfusionMatrixAnalyzer

analyzer = ConfusionMatrixAnalyzer(
    iou_threshold=0.5,
    use_2d_iou=True,
)

matrix, species = analyzer.analyze(
    detections_path="results/simplified.csv",
    labels_path="data/ground_truth.csv",
    output_path="results/confusion_strict",
)

1D IoU (temporal only)

from evaluation.confusion_matrix_analysis import ConfusionMatrixAnalyzer

analyzer = ConfusionMatrixAnalyzer(
    iou_threshold=0.5,
    use_2d_iou=False,
)

matrix, species = analyzer.analyze(
    detections_path="results/simplified.csv",
    labels_path="data/ground_truth.csv",
    output_path="results/confusion_1d",
)

Without background class

from evaluation.confusion_matrix_analysis import ConfusionMatrixAnalyzer

analyzer = ConfusionMatrixAnalyzer(include_background=False)

matrix, species = analyzer.analyze(
    detections_path="results/simplified.csv",
    labels_path="data/ground_truth.csv",
    output_path="results/confusion_no_bg",
)

Binary bird detection

from evaluation.confusion_matrix_analysis import ConfusionMatrixAnalyzer

analyzer = ConfusionMatrixAnalyzer(
    single_cls=True,
    single_cls_name="bird",
)

matrix, species = analyzer.analyze(
    detections_path="results/simplified.csv",
    labels_path="data/ground_truth.csv",
    output_path="results/confusion_binary",
)

Load, compute, and inspect programmatically

from evaluation.confusion_matrix_analysis import ConfusionMatrixAnalyzer
import numpy as np

analyzer = ConfusionMatrixAnalyzer(iou_threshold=0.25)

detections = analyzer.load_detections_csv("results/simplified.csv")
labels = analyzer.load_labels_csv("data/ground_truth.csv")

matrix, species = analyzer.compute_confusion_matrix(detections, labels)

for i, sp in enumerate(species):
    tp = matrix[i, i]
    fn = matrix[i, species.index("background")] if "background" in species else 0
    print(f"{sp}: TP={tp}, FN={fn}")

Typical Workflow

This class sits at Step 4 of the standard evaluation pipeline:

Step 1  BirdCallDetector(conf_threshold=0.001).detect(..., no_merge=True)  →  raw_detections.json
Step 2  FBetaScoreAnalyzer().analyze_confidence_thresholds(...)             →  optimal_thresholds.csv
Step 3  DetectionFilter().filter_detections(data, conf=0.35)               →  simplified.csv
Step 4  ConfusionMatrixAnalyzer().analyze(...)                              →  confusion_matrix/

Input File Format

Both detections_path and labels_path must use the six-column schema: Filename, Start Time (s), End Time (s), Low Freq (Hz), High Freq (Hz), Species eBird Code. Filenames are matched by stem only (extension-agnostic). See Ground-Truth Labels — CSV schema for the full definition.