BirdCallDetector
Detect bird calls in audio files using a trained YOLO model. Processes WAV, FLAC, OGG, and MP3 files through the PCEN spectrogram pipeline and returns timestamped song segments with species labels and confidence scores.
Import¶
BirdCallDetector¶
The main entry point for all inference workflows. Instantiate once per model, then call detect() for any number of files.
Constructor¶
BirdCallDetector(
model_path,
species_mapping,
conf_threshold=0.001,
nms_iou_threshold=0.7,
song_gap_threshold=0.1,
num_workers=1,
)
| Parameter | Type / Default | Required? | Description |
|---|---|---|---|
model_path |
str / — |
Yes | Path to the trained YOLO model file (.pt, .onnx, .engine, etc.). |
species_mapping |
str / — |
Yes | Dataset key for class-ID-to-species mapping. Must match the mapping the model was trained with. See allowed values. |
conf_threshold |
float / 0.001 |
No | Confidence threshold (0.0–1.0). Detections below this value are discarded. Use 0.2 for direct field use. Use 0.001 to retain all raw detections for evaluation. |
nms_iou_threshold |
float / 0.7 |
No | IoU threshold for Non-Maximum Suppression applied per-clip and across overlapping time windows. |
song_gap_threshold |
float / 0.1 |
No | Maximum temporal gap in seconds between two detections of the same species that are still merged into one continuous song segment. |
num_workers |
int / 1 |
No | Number of parallel inference workers. Each worker loads its own model copy. Increase on multi-core GPU systems for batch processing. |
Memory Usage
Each additional worker loads a full copy of the model. With num_workers=4 and a 100 MB model, approximately 400 MB of model memory is allocated. Monitor memory when increasing workers significantly.
Methods¶
| Method | Returns | Description |
|---|---|---|
detect(audio_path, output_path, output_formats, no_merge) |
List[Dict] |
Detect in a file or directory. Main entry point. |
detect_single_file(audio_path, progress_callback, no_merge) |
List[Dict] |
Detect in a single audio file. |
detect_multiple_files(audio_paths, output_path, output_formats, no_merge) |
List[Dict] |
Detect across a list of audio files. |
save_results(detections, output_path, audio_path, output_formats, no_merge) |
None |
Write detections to one or more output formats. |
merge_overlapping_detections(detections, merge_mode) |
List[Dict] |
Merge raw clip detections. |
load_audio(audio_path) |
Tuple[np.ndarray, int] |
Load an audio file, returns (signal, sample_rate). |
detect¶
Detect bird calls in an audio file or directory. Automatically routes to detect_single_file() or detect_multiple_files() based on whether audio_path is a file or a directory.
| Parameter | Type / Default | Required? | Description |
|---|---|---|---|
audio_path |
str / — |
Yes | Path to a single audio file (WAV, FLAC, OGG, MP3) or a directory. Directories are searched recursively. |
output_path |
str / None |
No | Output directory for result files. If None, results are only returned, not written. |
output_formats |
List[str] / ['json-with-algorithm-metadata'] |
No | One or more output format keys. Accepts json-with-algorithm-metadata, simplified-csv, xeno-canto-annota-json, raven-selection-table, or all. Ignored when no_merge=True. |
no_merge |
bool / False |
No | Evaluation mode. Returns raw per-clip detections and writes only raw_detections.json. Use with conf_threshold=0.001 for F-beta sweep workflows. |
Returns: List[Dict] — merged song segments (or raw clip detections when no_merge=True).
detect_single_file¶
Detect bird calls in a single audio file. Use this method directly when you need a progress_callback (e.g. in a GUI or web app).
| Parameter | Type / Default | Required? | Description |
|---|---|---|---|
audio_path |
str / — |
Yes | Path to the audio file. |
progress_callback |
Callable[[int, int, str], None] / None |
No | Called as callback(current, total, message) after each clip. Useful for progress bars. |
no_merge |
bool / False |
No | If True, return raw clip-level detections without merging. |
Returns: List[Dict] — detections for this file.
save_results¶
detector.save_results(
detections,
output_path,
audio_path=None,
output_formats=None,
no_merge=False,
)
Write detections to one or more output formats under output_path.
| Parameter | Type / Default | Required? | Description |
|---|---|---|---|
detections |
List[Dict] / — |
Yes | Detections returned by detect() or detect_single_file(). |
output_path |
str / — |
Yes | Directory to write output files into. |
audio_path |
str / None |
No | Source audio path, used for metadata in JSON output. |
output_formats |
List[str] / ['json-with-algorithm-metadata'] |
No | One or more format keys (same values as detect()). |
no_merge |
bool / False |
No | If True, writes only raw_detections.json and ignores output_formats. |
merge_overlapping_detections¶
Merge raw clip detections into song segments or deduplicate with NMS.
| Parameter | Type / Default | Required? | Description |
|---|---|---|---|
detections |
List[Dict] / — |
Yes | Raw per-clip detections. |
merge_mode |
str / 'reconstruct' |
No | 'reconstruct' merges temporally adjacent detections into song segments (default). 'nms' removes duplicates by IoU, keeping the highest-confidence box. |
Returns: List[Dict] — merged or deduplicated detections.
reconstruct_songs¶
Standalone version of the song-merging logic. Use this outside a BirdCallDetector instance, for example in evaluation scripts that operate on previously saved raw detections.
from inference.detect_birds import reconstruct_songs
merged = reconstruct_songs(detections, song_gap_threshold)
| Parameter | Type / Default | Required? | Description |
|---|---|---|---|
detections |
List[Dict] / — |
Yes | Raw detections, each with time_start, time_end, species_id, species, confidence, freq_low_hz, freq_high_hz. Optional filename key for multi-file inputs. |
song_gap_threshold |
float / — |
Yes | Max gap in seconds between two detections of the same species to merge them into one song segment. |
Returns: List[Dict] — merged song segments, sorted by (filename, time_start).
Each merged segment contains:
| Field | Type | Description |
|---|---|---|
species |
str |
eBird species code |
species_id |
int |
Model class ID |
time_start |
float |
Start time in seconds |
time_end |
float |
End time in seconds |
avg_confidence |
float |
Running average confidence across merged clips |
max_confidence |
float |
Highest confidence among merged clips |
detections_merged |
int |
Number of raw clips that were merged |
freq_low_hz |
int |
Lowest frequency of merged bounding boxes |
freq_high_hz |
int |
Highest frequency of merged bounding boxes |
filename |
str |
Source filename (multi-file inputs only) |
find_audio_files¶
Discover all supported audio files in a path.
| Parameter | Type / Default | Required? | Description |
|---|---|---|---|
audio_path |
str / — |
Yes | Path to a single audio file or a directory. Directories are searched recursively. |
Returns: List[str] — sorted list of absolute file paths. Supported extensions: .wav, .flac, .ogg, .mp3.
Examples¶
Basic detection¶
from inference.detect_birds import BirdCallDetector
detector = BirdCallDetector(
model_path="models/Hawaii.pt",
species_mapping="Hawaii",
conf_threshold=0.2,
)
detections = detector.detect(
"recording.wav",
output_path="results",
output_formats=["simplified-csv"],
)
print(f"Found {len(detections)} song segments")
Evaluation workflow (raw detections)¶
from inference.detect_birds import BirdCallDetector
detector = BirdCallDetector(
model_path="models/Hawaii.pt",
species_mapping="Hawaii",
conf_threshold=0.001,
)
raw_detections = detector.detect(
"data/test_audio/",
output_path="results",
no_merge=True,
)
This writes results/raw_detections.json. Feed it into FBetaScoreAnalyzer next.
With progress callback¶
from inference.detect_birds import BirdCallDetector
detector = BirdCallDetector(
model_path="models/All-In-One.pt",
species_mapping="All-In-One",
)
def on_progress(current, total, message):
print(f"[{current}/{total}] {message}")
detections = detector.detect_single_file(
"long_recording.flac",
progress_callback=on_progress,
)
Using reconstruct_songs independently¶
import json
from inference.detect_birds import reconstruct_songs
with open("results/raw_detections.json") as f:
data = json.load(f)
raw = data["detections"]
gap = data["model_config"]["song_gap_threshold"]
merged = reconstruct_songs(raw, song_gap_threshold=gap)
print(f"{len(raw)} raw detections → {len(merged)} song segments")
Parallel inference¶
from inference.detect_birds import BirdCallDetector
detector = BirdCallDetector(
model_path="models/All-In-One.pt",
species_mapping="All-In-One",
num_workers=4,
)
detections = detector.detect(
"long_recording.wav",
output_path="results",
output_formats=["simplified-csv"],
)
Lossy Audio Formats
The model was trained on lossless WAV files. When processing MP3 or OGG input, detection performance may degrade, especially for faint calls and high-frequency species. Use WAV or FLAC whenever possible. If you must use MP3, ensure a bitrate of ≥ 256 kbps.