Skip to content

BirdCallDetector

Detect bird calls in audio files using a trained YOLO model. Processes WAV, FLAC, OGG, and MP3 files through the PCEN spectrogram pipeline and returns timestamped song segments with species labels and confidence scores.


Import

from inference.detect_birds import BirdCallDetector, reconstruct_songs, find_audio_files

BirdCallDetector

The main entry point for all inference workflows. Instantiate once per model, then call detect() for any number of files.

Constructor

BirdCallDetector(
    model_path,
    species_mapping,
    conf_threshold=0.001,
    nms_iou_threshold=0.7,
    song_gap_threshold=0.1,
    num_workers=1,
)
Parameter Type / Default Required? Description
model_path str / — Yes Path to the trained YOLO model file (.pt, .onnx, .engine, etc.).
species_mapping str / — Yes Dataset key for class-ID-to-species mapping. Must match the mapping the model was trained with. See allowed values.
conf_threshold float / 0.001 No Confidence threshold (0.0–1.0). Detections below this value are discarded. Use 0.2 for direct field use. Use 0.001 to retain all raw detections for evaluation.
nms_iou_threshold float / 0.7 No IoU threshold for Non-Maximum Suppression applied per-clip and across overlapping time windows.
song_gap_threshold float / 0.1 No Maximum temporal gap in seconds between two detections of the same species that are still merged into one continuous song segment.
num_workers int / 1 No Number of parallel inference workers. Each worker loads its own model copy. Increase on multi-core GPU systems for batch processing.

Memory Usage

Each additional worker loads a full copy of the model. With num_workers=4 and a 100 MB model, approximately 400 MB of model memory is allocated. Monitor memory when increasing workers significantly.


Methods

Method Returns Description
detect(audio_path, output_path, output_formats, no_merge) List[Dict] Detect in a file or directory. Main entry point.
detect_single_file(audio_path, progress_callback, no_merge) List[Dict] Detect in a single audio file.
detect_multiple_files(audio_paths, output_path, output_formats, no_merge) List[Dict] Detect across a list of audio files.
save_results(detections, output_path, audio_path, output_formats, no_merge) None Write detections to one or more output formats.
merge_overlapping_detections(detections, merge_mode) List[Dict] Merge raw clip detections.
load_audio(audio_path) Tuple[np.ndarray, int] Load an audio file, returns (signal, sample_rate).

detect

detector.detect(
    audio_path,
    output_path=None,
    output_formats=None,
    no_merge=False,
)

Detect bird calls in an audio file or directory. Automatically routes to detect_single_file() or detect_multiple_files() based on whether audio_path is a file or a directory.

Parameter Type / Default Required? Description
audio_path str / — Yes Path to a single audio file (WAV, FLAC, OGG, MP3) or a directory. Directories are searched recursively.
output_path str / None No Output directory for result files. If None, results are only returned, not written.
output_formats List[str] / ['json-with-algorithm-metadata'] No One or more output format keys. Accepts json-with-algorithm-metadata, simplified-csv, xeno-canto-annota-json, raven-selection-table, or all. Ignored when no_merge=True.
no_merge bool / False No Evaluation mode. Returns raw per-clip detections and writes only raw_detections.json. Use with conf_threshold=0.001 for F-beta sweep workflows.

Returns: List[Dict] — merged song segments (or raw clip detections when no_merge=True).


detect_single_file

detector.detect_single_file(
    audio_path,
    progress_callback=None,
    no_merge=False,
)

Detect bird calls in a single audio file. Use this method directly when you need a progress_callback (e.g. in a GUI or web app).

Parameter Type / Default Required? Description
audio_path str / — Yes Path to the audio file.
progress_callback Callable[[int, int, str], None] / None No Called as callback(current, total, message) after each clip. Useful for progress bars.
no_merge bool / False No If True, return raw clip-level detections without merging.

Returns: List[Dict] — detections for this file.


save_results

detector.save_results(
    detections,
    output_path,
    audio_path=None,
    output_formats=None,
    no_merge=False,
)

Write detections to one or more output formats under output_path.

Parameter Type / Default Required? Description
detections List[Dict] / — Yes Detections returned by detect() or detect_single_file().
output_path str / — Yes Directory to write output files into.
audio_path str / None No Source audio path, used for metadata in JSON output.
output_formats List[str] / ['json-with-algorithm-metadata'] No One or more format keys (same values as detect()).
no_merge bool / False No If True, writes only raw_detections.json and ignores output_formats.

merge_overlapping_detections

detector.merge_overlapping_detections(detections, merge_mode='reconstruct')

Merge raw clip detections into song segments or deduplicate with NMS.

Parameter Type / Default Required? Description
detections List[Dict] / — Yes Raw per-clip detections.
merge_mode str / 'reconstruct' No 'reconstruct' merges temporally adjacent detections into song segments (default). 'nms' removes duplicates by IoU, keeping the highest-confidence box.

Returns: List[Dict] — merged or deduplicated detections.


reconstruct_songs

Standalone version of the song-merging logic. Use this outside a BirdCallDetector instance, for example in evaluation scripts that operate on previously saved raw detections.

from inference.detect_birds import reconstruct_songs

merged = reconstruct_songs(detections, song_gap_threshold)
Parameter Type / Default Required? Description
detections List[Dict] / — Yes Raw detections, each with time_start, time_end, species_id, species, confidence, freq_low_hz, freq_high_hz. Optional filename key for multi-file inputs.
song_gap_threshold float / — Yes Max gap in seconds between two detections of the same species to merge them into one song segment.

Returns: List[Dict] — merged song segments, sorted by (filename, time_start).

Each merged segment contains:

Field Type Description
species str eBird species code
species_id int Model class ID
time_start float Start time in seconds
time_end float End time in seconds
avg_confidence float Running average confidence across merged clips
max_confidence float Highest confidence among merged clips
detections_merged int Number of raw clips that were merged
freq_low_hz int Lowest frequency of merged bounding boxes
freq_high_hz int Highest frequency of merged bounding boxes
filename str Source filename (multi-file inputs only)

find_audio_files

Discover all supported audio files in a path.

from inference.detect_birds import find_audio_files

paths = find_audio_files(audio_path)
Parameter Type / Default Required? Description
audio_path str / — Yes Path to a single audio file or a directory. Directories are searched recursively.

Returns: List[str] — sorted list of absolute file paths. Supported extensions: .wav, .flac, .ogg, .mp3.


Examples

Basic detection

from inference.detect_birds import BirdCallDetector

detector = BirdCallDetector(
    model_path="models/Hawaii.pt",
    species_mapping="Hawaii",
    conf_threshold=0.2,
)

detections = detector.detect(
    "recording.wav",
    output_path="results",
    output_formats=["simplified-csv"],
)

print(f"Found {len(detections)} song segments")
from inference.detect_birds import BirdCallDetector

detector = BirdCallDetector(
    model_path="models/Western-US.pt",
    species_mapping="Western-US",
)

detections = detector.detect(
    "/path/to/audio/folder",
    output_path="results",
    output_formats=["all"],
)

Evaluation workflow (raw detections)

from inference.detect_birds import BirdCallDetector

detector = BirdCallDetector(
    model_path="models/Hawaii.pt",
    species_mapping="Hawaii",
    conf_threshold=0.001,
)

raw_detections = detector.detect(
    "data/test_audio/",
    output_path="results",
    no_merge=True,
)

This writes results/raw_detections.json. Feed it into FBetaScoreAnalyzer next.

With progress callback

from inference.detect_birds import BirdCallDetector

detector = BirdCallDetector(
    model_path="models/All-In-One.pt",
    species_mapping="All-In-One",
)

def on_progress(current, total, message):
    print(f"[{current}/{total}] {message}")

detections = detector.detect_single_file(
    "long_recording.flac",
    progress_callback=on_progress,
)

Using reconstruct_songs independently

import json
from inference.detect_birds import reconstruct_songs

with open("results/raw_detections.json") as f:
    data = json.load(f)

raw = data["detections"]
gap = data["model_config"]["song_gap_threshold"]

merged = reconstruct_songs(raw, song_gap_threshold=gap)
print(f"{len(raw)} raw detections → {len(merged)} song segments")

Parallel inference

from inference.detect_birds import BirdCallDetector

detector = BirdCallDetector(
    model_path="models/All-In-One.pt",
    species_mapping="All-In-One",
    num_workers=4,
)

detections = detector.detect(
    "long_recording.wav",
    output_path="results",
    output_formats=["simplified-csv"],
)

Lossy Audio Formats

The model was trained on lossless WAV files. When processing MP3 or OGG input, detection performance may degrade, especially for faint calls and high-frequency species. Use WAV or FLAC whenever possible. If you must use MP3, ensure a bitrate of ≥ 256 kbps.