Skip to content

Utilities

Preprocessing, annotation parsing, settings schema, evaluation, logging, and CPU-safe unpickling.

Symbol File Role
Preprocessing eso/utils/preprocessing.py Audio loading, optional filtering, mel-spectrogram generation.
AnnotationReader eso/utils/AnnotationReader.py Parse SVL or compatible XML annotation files.
Config and friends eso/utils/settings.py Typed configuration schema. One dataclass per section of the JSON.
Evaluation eso/utils/Evaluation.py Sliding-window inference, bout reconstruction, comparison metrics.
plot_chromosome · setup_logger · log_tensorboard eso/utils/logger.py Visualisation and logging helpers.
CPU_Unpickler eso/utils/unpickler.py Unpickle GPU-trained tensors onto CPU.

eso.utils.preprocessing

The audio-to-spectrogram pipeline. The class produces two datasets per species: a preprocessed one (low-pass filtered and downsampled, used to train the baseline) and an unprocessed one (used by ESO). Audio is segmented into fixed-length windows with a one-second overlap. Each segment is converted to a mel-spectrogram with a Hann window and a configurable hop length. Class balancing through time shifting, blending, and additive noise is also handled here.

AnnotationReader

AnnotationReader(
    path: str,
    annotation_file_name: str,
    file_type: str,
    audio_extension: str,
    positive_class: str,
)
Source code in eso/utils/AnnotationReader.py
def __init__(
    self, 
    path : str, 
    annotation_file_name : str, 
    file_type : str, 
    audio_extension : str, 
    positive_class: str):


    self.path = path
    self.annotation_file_name = annotation_file_name
    self.file_type = file_type
    self.audio_extension = audio_extension
    self.positive_class=positive_class
    """
    Initializes the AnnotationReader class.

    Parameters
    ----------
    path : str
        The path to the directory containing the annotation and audio files.
    annotation_file_name : str
        The name of the annotation file (without extension) to be read.
    file_type : str
        The type of annotation file (e.g., "svl", "xml").
    audio_extension : str
        The file extension for the associated audio files (e.g., ".wav", ".mp3").
    positive_class : str
        The label representing the positive class in classification tasks.

    Returns
    -------
    None
    """

path instance-attribute

path = path

annotation_file_name instance-attribute

annotation_file_name = annotation_file_name

file_type instance-attribute

file_type = file_type

audio_extension instance-attribute

audio_extension = audio_extension

positive_class instance-attribute

positive_class = positive_class

Initializes the AnnotationReader class.

Parameters:

Name Type Description Default
path str

The path to the directory containing the annotation and audio files.

required
annotation_file_name str

The name of the annotation file (without extension) to be read.

required
file_type str

The type of annotation file (e.g., "svl", "xml").

required
audio_extension str

The file extension for the associated audio files (e.g., ".wav", ".mp3").

required
positive_class str

The label representing the positive class in classification tasks.

required

Returns:

Type Description
None

get_annotation_information

get_annotation_information(annotation_folder, sufix_file)

Extract annotation information from an .svl XML file and return a DataFrame with start times, end times, and labels for the annotations.

This method parses an XML annotation file (.svl format) to extract annotation details including the start time, end time, and label for each annotation. It processes the XML file, handles any confidence values, and adjusts labels accordingly (e.g., using the positive class label for predicted annotations).

Parameters:

Name Type Description Default
annotation_folder str

The folder where the annotation file is located.

required
sufix_file str

The suffix to append to the base annotation file name to get the full file name.

required

Returns:

Type Description
tuple

A tuple containing: - pd.DataFrame: A DataFrame with three columns: - 'Start': The start time of the annotation in seconds. - 'End': The end time of the annotation in seconds. - 'Label': The label associated with the annotation. - str: The name of the corresponding audio file (with ".wav" extension).

Raises:

Type Description
Exception

If the annotation file does not contain valid annotation information.

Source code in eso/utils/AnnotationReader.py
def get_annotation_information(self, annotation_folder, sufix_file ):
    """
    Extract annotation information from an `.svl` XML file and return a DataFrame
    with start times, end times, and labels for the annotations.

    This method parses an XML annotation file (`.svl` format) to extract annotation
    details including the start time, end time, and label for each annotation.
    It processes the XML file, handles any confidence values, and adjusts labels
    accordingly (e.g., using the positive class label for predicted annotations).

    Parameters
    ----------
    annotation_folder : str
        The folder where the annotation file is located.
    sufix_file : str
        The suffix to append to the base annotation file name to get the full file name.

    Returns
    -------
    tuple
        A tuple containing:
        - pd.DataFrame: A DataFrame with three columns:
            - 'Start': The start time of the annotation in seconds.
            - 'End': The end time of the annotation in seconds.
            - 'Label': The label associated with the annotation.
        - str: The name of the corresponding audio file (with ".wav" extension).

    Raises
    ------
    Exception
        If the annotation file does not contain valid annotation information.
    """

    path = str(Path(
            self.path, annotation_folder, self.annotation_file_name + sufix_file
        ))


    xmldoc = minidom.parse(path)
    itemlist = xmldoc.getElementsByTagName("point")
    idlist = xmldoc.getElementsByTagName("model")

    start_time = []
    end_time = []
    labels = []
    audio_file_name = ""

    if len(idlist) > 0:
        for s in idlist: 
            original_sample_rate = int(s.attributes["sampleRate"].value)


    if len(itemlist) > 0:

        # Iterate over each annotation in the .svl file (annotatation file)
        for s in itemlist:
            # Get the starting seconds from the annotation file. Must be an integer
            # so that the correct frame from the waveform can be extracted
            start_seconds = (
                    float(s.attributes["frame"].value) / original_sample_rate
                )

            # Get the label from the annotation file
            label = str(s.attributes["label"].value)

            # Set the default confidence to 10 (i.e. high confidence that
            # the label is correct). Annotations that do not have the idea
            # of 'confidence' are teated like normal annotations and it is
            # assumed that the annotation is correct (by the annotator).
            label_confidence = 10

            # Check if a confidence has been assigned
            if "," in label:
                # Extract the raw label
                lalel_string = label[: label.find(",") :]

                # Extract confidence value
                label_confidence = int(label[label.find(",") + 1 :])

                # Set the label to the raw label
                label = lalel_string

                # If a file has a blank label then skip this annotation
                # to avoid mislabelling data
            if label == "":
                break


            #to include predictions obtained from a model
            if label == "predicted" :
                label=self.positive_class

            # Only considered cases where the labels are very confident
            # 10 = very confident, 5 = medium, 1 = unsure this is represented
            # as "SPECIES:10", "SPECIES:5" when annotating.
            if label_confidence == 10:
                # Get the duration from the annotation file
                annotation_duration_seconds = (
                        float(s.attributes["duration"].value) / original_sample_rate
                    )
                start_time.append(start_seconds)
                end_time.append(start_seconds + annotation_duration_seconds)
                labels.append(label)

    df_svl_gibbons = pd.DataFrame(
            {"Start": start_time, "End": end_time, "Label": labels}
        )
    return df_svl_gibbons, self.annotation_file_name + ".wav"

get_annotation_information_testing

get_annotation_information_testing()

Extract annotation information from a .svl XML file and return a DataFrame with frame, value, duration, extent, and label for each annotation.

This method parses an XML annotation file (.svl format) to extract detailed annotation information such as frame number, value, duration, extent, and label. It also extracts the sample rate, start time, and end time from the file's metadata.

Parameters:

Name Type Description Default
None
required

Returns:

Type Description
tuple

A tuple containing: - pd.DataFrame: A DataFrame with columns: - 'frame': The frame number from the annotation. - 'value': The value associated with the annotation. - 'duration': The duration of the annotation. - 'extent': The extent of the annotation. - 'label': The label associated with the annotation. - int: The sample rate extracted from the .svl file. - str: The start time of the annotation in the .svl file. - str: The end time of the annotation in the .svl file.

Raises:

Type Description
Exception

If the annotation file is not found or if it does not contain valid annotation information.

Source code in eso/utils/AnnotationReader.py
def get_annotation_information_testing(self):
    """
    Extract annotation information from a `.svl` XML file and return a DataFrame
    with frame, value, duration, extent, and label for each annotation.

    This method parses an XML annotation file (`.svl` format) to extract detailed
    annotation information such as frame number, value, duration, extent, and label.
    It also extracts the sample rate, start time, and end time from the file's metadata.

    Parameters
    ----------
    None

    Returns
    -------
    tuple
        A tuple containing:
        - pd.DataFrame: A DataFrame with columns:
            - 'frame': The frame number from the annotation.
            - 'value': The value associated with the annotation.
            - 'duration': The duration of the annotation.
            - 'extent': The extent of the annotation.
            - 'label': The label associated with the annotation.
        - int: The sample rate extracted from the `.svl` file.
        - str: The start time of the annotation in the `.svl` file.
        - str: The end time of the annotation in the `.svl` file.

    Raises
    ------
    Exception
        If the annotation file is not found or if it does not contain valid annotation information.
    """

    path = os.path.join(
            self.path, "Annotations", self.annotation_file_name + ".svl"
        )

    # Process the .svl xml file
    xmldoc = minidom.parse(path)
    itemlist = xmldoc.getElementsByTagName('point')
    idlist = xmldoc.getElementsByTagName('model')

    sampleRate = idlist.item(0).attributes['sampleRate'].value 
    start_m = idlist.item(0).attributes['start'].value
    end_m = idlist.item(0).attributes['end'].value


    values = []
    frames = []
    durations=[]
    extents=[]
    labels = []
    audio_file_name = ''

    if len(idlist) > 0:
        for s in idlist: 
            original_sample_rate = int(s.attributes["sampleRate"].value)

    if (len(itemlist) > 0):

    # Iterate over each annotation in the .svl file (annotatation file)
        for s in itemlist:

            # Get the starting seconds from the annotation file. Must be an integer
            # so that the correct frame from the waveform can be extracted
            frame = float(s.attributes['frame'].value)
            value = float(s.attributes['value'].value)
            duration = float(s.attributes['duration'].value)
            extent = float(s.attributes['extent'].value)
            label = str(s.attributes['label'].value)

            # Set the default confidence to 10 (i.e. high confidence that
            # the label is correct). Annotations that do not have the idea
            # of 'confidence' are teated like normal annotations and it is
            # assumed that the annotation is correct (by the annotator). 
            label_confidence = 10

            # Check if a confidence has been assigned
            if ',' in label:

                # Extract the raw label
                lalel_string = label[:label.find(','):]

                # Extract confidence value
                label_confidence = int(label[label.find(',')+1:])

                # Set the label to the raw label
                label = lalel_string


            # If a file has a blank label then skip this annotation
            # to avoid mislabelling data
            if label == '':
                break

            # Only considered cases where the labels are very confident
            # 10 = very confident, 5 = medium, 1 = unsure this is represented
            # as "SPECIES:10", "SPECIES:5" when annotating.
            if label_confidence == 10:

                frames.append(frame)
                values.append(value)
                durations.append(duration)
                extents.append(extent)
                labels.append(label)

    df_svl_gibbons = pd.DataFrame({'frame': frames, 'value':values ,'duration': durations,
                              'extent':extents,'label':labels})
    return df_svl_gibbons, sampleRate, start_m, end_m

dataframe_to_svl

dataframe_to_svl(dataframe, sample_rate, start_m, end_m)

Convert a DataFrame of annotations to a .svl format XML string.

This method generates a .svl format XML string containing the annotations from a DataFrame. The generated XML includes metadata such as the sample rate, start time, end time, and annotation points (frame, value, duration, extent, and label).

Parameters:

Name Type Description Default
dataframe DataFrame

A DataFrame containing the annotation information. The DataFrame should have the following columns: 'frame', 'value', 'duration', 'extent', and 'label'.

required
sample_rate int

The sample rate of the audio associated with the annotations.

required
start_m str

The start time (in seconds) of the annotation period.

required
end_m str

The end time (in seconds) of the annotation period.

required

Returns:

Type Description
str

A string containing the XML in .svl format, representing the annotations along with metadata.

Notes

The function generates an XML document that includes: - <model>: metadata about the annotation model, including sample rate, start time, and end time. - <dataset>: contains <point> elements that represent individual annotations. - <display>: defines the display settings for the annotation in the software.

Source code in eso/utils/AnnotationReader.py
def dataframe_to_svl(self, dataframe, sample_rate, start_m, end_m):
    """
    Convert a DataFrame of annotations to a `.svl` format XML string.

    This method generates a `.svl` format XML string containing the annotations
    from a DataFrame. The generated XML includes metadata such as the sample rate,
    start time, end time, and annotation points (frame, value, duration, extent, and label).

    Parameters
    ----------
    dataframe : pd.DataFrame
        A DataFrame containing the annotation information. The DataFrame should have 
        the following columns: 'frame', 'value', 'duration', 'extent', and 'label'.
    sample_rate : int
        The sample rate of the audio associated with the annotations.
    start_m : str
        The start time (in seconds) of the annotation period.
    end_m : str
        The end time (in seconds) of the annotation period.

    Returns
    -------
    str
        A string containing the XML in `.svl` format, representing the annotations
        along with metadata.

    Notes
    -----
    The function generates an XML document that includes:
    - `<model>`: metadata about the annotation model, including sample rate, start time, and end time.
    - `<dataset>`: contains `<point>` elements that represent individual annotations.
    - `<display>`: defines the display settings for the annotation in the software.
    """
    doc, tag, text = Doc().tagtext()
    doc.asis('<?xml version="1.0" encoding="UTF-8"?>')
    doc.asis('<!DOCTYPE sonic-visualiser>')

    with tag('sv'):
        with tag('data'):

            model_string = '<model id="10" name="" sampleRate="{}" start="{}" end="{}" type="sparse" dimensions="2" resolution="1" notifyOnAdd="true" dataset="9" subtype="box" minimum="600" maximum="{}" units="Hz" />'.format(sample_rate, 
                                                                    start_m,
                                                                    end_m,
                                                                    1000)
            doc.asis(model_string)

        with tag('dataset', id='9', dimensions='2'):

            # Read dataframe or other data structure and add the values here
            # These are added as "point" elements, for example:
            # '<point frame="15360" value="3136.87" duration="1724416" extent="2139.22" label="Cape Robin" />'
            for index, row in dataframe.iterrows():

                point  = '<point frame="{}" value="{}" duration="{}" extent="{}" label="{}" />'.format(
                    int(row['frame']), 
                    row['value'],
                    int(row['duration']),
                    1500,
                    row['label'])

                # add the point
                doc.asis(point)
        with tag('display'):

            display_string = '<layer id="2" type="boxes" name="Boxes" model="10"  verticalScale="0"  colourName="White" colour="#ffffff" darkBackground="true" />'
            doc.asis(display_string)

    result = indent(
        doc.getvalue(),
        indentation = ' '*2,
        newline = '\r\n'
    )

    return result

Preprocessing

Preprocessing(
    species_folder: str,
    sample_rate: int,
    lowpass_cutoff: int,
    downsample_rate: int,
    nyquist_rate: int,
    segment_duration: int,
    positive_class: str,
    negative_class: str,
    nb_negative_class: int,
    n_fft: int,
    hop_length: int,
    n_mels: int,
    f_min: int,
    f_max: int,
    file_type: str,
    audio_extension: str,
    apply_preprocessing: bool = True,
)

Initialize the Preprocessing object.

Parameters:

Name Type Description Default
species_folder str

Path to the species folder containing audio and annotation data.

required
sample_rate int

The sample rate for unprocessed audio files.

required
lowpass_cutoff int

The cutoff frequency for the low-pass filter.

required
downsample_rate int

The rate at which to downsample the audio.

required
nyquist_rate int

The Nyquist rate, half of the sampling rate.

required
segment_duration int

Duration of each audio segment in seconds.

required
positive_class str

Label representing the positive class in the dataset.

required
negative_class str

Label representing the negative class in the dataset.

required
nb_negative_class int

Number of negative class samples.

required
n_fft int

The length of the FFT window for spectrograms.

required
hop_length int

The hop length for generating spectrograms.

required
n_mels int

The number of mel bands to use in the spectrogram.

required
f_min int

The minimum frequency for the mel filter bank.

required
f_max int

The maximum frequency for the mel filter bank.

required
file_type str

The type of annotation files to process (e.g., '.svl').

required
audio_extension str

The file extension for the audio files (e.g., '.wav').

required
apply_preprocessing bool

Whether to apply preprocessing steps like filtering and downsampling. Default is True.

True

Returns:

Type Description
None
Source code in eso/utils/preprocessing.py
def __init__(
    self,
    species_folder : str,
    sample_rate: int,
    lowpass_cutoff : int,
    downsample_rate : int,
    nyquist_rate : int,
    segment_duration : int,
    positive_class : str,
    negative_class : str,
    nb_negative_class : int,
    n_fft : int,
    hop_length : int,
    n_mels : int,
    f_min : int,
    f_max : int,
    file_type : str,
    audio_extension : str,
    apply_preprocessing: bool=True,

) -> None:
    """
    Initialize the Preprocessing object.

    Parameters
    ----------
    species_folder : str
        Path to the species folder containing audio and annotation data.
    sample_rate : int
        The sample rate for unprocessed audio files.
    lowpass_cutoff : int
        The cutoff frequency for the low-pass filter.
    downsample_rate : int
        The rate at which to downsample the audio.
    nyquist_rate : int
        The Nyquist rate, half of the sampling rate.
    segment_duration : int
        Duration of each audio segment in seconds.
    positive_class : str
        Label representing the positive class in the dataset.
    negative_class : str
        Label representing the negative class in the dataset.
    nb_negative_class : int
        Number of negative class samples.
    n_fft : int
        The length of the FFT window for spectrograms.
    hop_length : int
        The hop length for generating spectrograms.
    n_mels : int
        The number of mel bands to use in the spectrogram.
    f_min : int
        The minimum frequency for the mel filter bank.
    f_max : int
        The maximum frequency for the mel filter bank.
    file_type : str
        The type of annotation files to process (e.g., '.svl').
    audio_extension : str
        The file extension for the audio files (e.g., '.wav').
    apply_preprocessing : bool, optional
        Whether to apply preprocessing steps like filtering and downsampling. Default is True.

    Returns
    -------
    None
    """
    self.sample_rate_unpreprocessed=sample_rate
    self.species_folder = species_folder
    self.lowpass_cutoff = lowpass_cutoff
    self.downsample_rate = downsample_rate
    self.nyquist_rate = nyquist_rate
    self.segment_duration = segment_duration
    self.positive_class = positive_class
    self.negative_class = negative_class
    self.nb_negative_class = nb_negative_class
    self.audio_path = Path(self.species_folder, "Audio")
    self.annotations_path = Path(self.species_folder, "Annotations")
    self.saved_data_path = Path(self.species_folder, "SavedData")
    self.training_files = Path(self.species_folder, "DataFiles", "TrainingFiles.txt")      
    self.n_mels = n_mels
    self.f_min = f_min
    self.f_max = f_max
    self.file_type = file_type
    self.audio_extension = audio_extension
    self.apply_preprocessing = apply_preprocessing
    self.n_fft = n_fft
    self.hop_length = hop_length

sample_rate_unpreprocessed instance-attribute

sample_rate_unpreprocessed = sample_rate

species_folder instance-attribute

species_folder = species_folder

lowpass_cutoff instance-attribute

lowpass_cutoff = lowpass_cutoff

downsample_rate instance-attribute

downsample_rate = downsample_rate

nyquist_rate instance-attribute

nyquist_rate = nyquist_rate

segment_duration instance-attribute

segment_duration = segment_duration

positive_class instance-attribute

positive_class = positive_class

negative_class instance-attribute

negative_class = negative_class

nb_negative_class instance-attribute

nb_negative_class = nb_negative_class

audio_path instance-attribute

audio_path = Path(species_folder, 'Audio')

annotations_path instance-attribute

annotations_path = Path(species_folder, 'Annotations')

saved_data_path instance-attribute

saved_data_path = Path(species_folder, 'SavedData')

training_files instance-attribute

training_files = Path(species_folder, 'DataFiles', 'TrainingFiles.txt')

n_mels instance-attribute

n_mels = n_mels

f_min instance-attribute

f_min = f_min

f_max instance-attribute

f_max = f_max

file_type instance-attribute

file_type = file_type

audio_extension instance-attribute

audio_extension = audio_extension

apply_preprocessing instance-attribute

apply_preprocessing = apply_preprocessing

n_fft instance-attribute

n_fft = n_fft

hop_length instance-attribute

hop_length = hop_length

read_audio_file

read_audio_file(file_name)

Load an audio file and return its waveform and sample rate.

Parameters:

Name Type Description Default
file_name str

Name of the audio file including the extension (e.g., "audio1.wav").

required

Returns:

Type Description
tuple

A tuple containing: - np.ndarray: The audio waveform (amplitude values). - int: The sampling rate of the audio file.

Source code in eso/utils/preprocessing.py
def read_audio_file(self, file_name):
    """
    Load an audio file and return its waveform and sample rate.

    Parameters
    ----------
    file_name : str
        Name of the audio file including the extension (e.g., "audio1.wav").

    Returns
    -------
    tuple
        A tuple containing:
        - np.ndarray: The audio waveform (amplitude values).
        - int: The sampling rate of the audio file.
    """
    # Get the path to the file
    audio_folder = Path(file_name)

    # Read the amplitudes and sample rate
    audio_amps, audio_sample_rate = librosa.load(audio_folder, sr=None)

    return audio_amps, audio_sample_rate

butter_lowpass_filter

butter_lowpass_filter(data, cutoff_freq, nyq_freq, order=4)

Apply a Butterworth low-pass filter to the input signal.

This method filters the input signal using a zero-phase Butterworth low-pass filter designed with the specified cutoff and Nyquist frequencies.

Parameters:

Name Type Description Default
data ndarray

The input signal (1D array) to be filtered.

required
cutoff_freq float

The cutoff frequency of the low-pass filter (in Hz).

required
nyq_freq float

The Nyquist frequency (typically half the sampling rate).

required
order int

The order of the Butterworth filter. Default is 4.

4

Returns:

Type Description
ndarray

The filtered signal with the same shape as the input.

Source code in eso/utils/preprocessing.py
def butter_lowpass_filter(self, data, cutoff_freq, nyq_freq, order=4):
    """
    Apply a Butterworth low-pass filter to the input signal.

    This method filters the input signal using a zero-phase Butterworth low-pass
    filter designed with the specified cutoff and Nyquist frequencies.

    Parameters
    ----------
    data : np.ndarray
        The input signal (1D array) to be filtered.
    cutoff_freq : float
        The cutoff frequency of the low-pass filter (in Hz).
    nyq_freq : float
        The Nyquist frequency (typically half the sampling rate).
    order : int, optional
        The order of the Butterworth filter. Default is 4.

    Returns
    -------
    np.ndarray
        The filtered signal with the same shape as the input.
    """ 
    # Source: https://github.com/guillaume-chevalier/filtering-stft-and-laplace-transform
    b, a = self._butter_lowpass(cutoff_freq, nyq_freq, order=order)
    y = signal.filtfilt(b, a, data)
    return y

downsample_file

downsample_file(amplitudes, original_sr, new_sample_rate)

Downsample an audio waveform to a specified sample rate.

This function resamples the input audio from the original sample rate to a new, lower sample rate using the 'kaiser_fast' resampling method.

Parameters:

Name Type Description Default
amplitudes ndarray

The raw audio waveform (1D NumPy array of amplitude values).

required
original_sr int

The original sampling rate of the audio signal (in Hz).

required
new_sample_rate int

The desired sampling rate to downsample the audio to (in Hz).

required

Returns:

Type Description
tuple

A tuple containing: - np.ndarray: The downsampled audio waveform. - int: The new sampling rate (same as new_sample_rate).

Source code in eso/utils/preprocessing.py
def downsample_file(self, amplitudes, original_sr, new_sample_rate):
    """
    Downsample an audio waveform to a specified sample rate.

    This function resamples the input audio from the original sample rate
    to a new, lower sample rate using the 'kaiser_fast' resampling method.

    Parameters
    ----------
    amplitudes : np.ndarray
        The raw audio waveform (1D NumPy array of amplitude values).
    original_sr : int
        The original sampling rate of the audio signal (in Hz).
    new_sample_rate : int
        The desired sampling rate to downsample the audio to (in Hz).

    Returns
    -------
    tuple
        A tuple containing:
        - np.ndarray: The downsampled audio waveform.
        - int: The new sampling rate (same as `new_sample_rate`).
    """
    return (
        librosa.resample(
            amplitudes,
            orig_sr=original_sr,
            target_sr=new_sample_rate,
            res_type="kaiser_fast",
        ),
        new_sample_rate,
    )

convert_single_to_image

convert_single_to_image(audio, sample_rate)

Convert an audio waveform into a normalized mel-spectrogram image.

This function computes the mel-spectrogram from a raw audio signal and applies normalization to scale the spectrogram values between 0 and 1. If preprocessing is enabled, user-defined frequency limits are used; otherwise, default frequency bounds are applied.

Parameters:

Name Type Description Default
audio ndarray

The raw audio waveform (1D NumPy array of amplitude values).

required
sample_rate int

The sampling rate of the audio signal (in Hz).

required

Returns:

Type Description
ndarray

A 2D NumPy array representing the normalized mel-spectrogram image.

Source code in eso/utils/preprocessing.py
def convert_single_to_image(self, audio, sample_rate):
    """
    Convert an audio waveform into a normalized mel-spectrogram image.

    This function computes the mel-spectrogram from a raw audio signal and 
    applies normalization to scale the spectrogram values between 0 and 1.
    If preprocessing is enabled, user-defined frequency limits are used;
    otherwise, default frequency bounds are applied.

    Parameters
    ----------
    audio : np.ndarray
        The raw audio waveform (1D NumPy array of amplitude values).
    sample_rate : int
        The sampling rate of the audio signal (in Hz).

    Returns
    -------
    np.ndarray
        A 2D NumPy array representing the normalized mel-spectrogram image.
    """
    if not self.apply_preprocessing:
        f_min = 0
        f_max = 5000
    else:
        f_min = self.f_min
        f_max = self.f_max

    S = librosa.feature.melspectrogram(
        y=audio,
        sr=sample_rate,
        n_fft=self.n_fft,
        hop_length=self.hop_length,
        n_mels=self.n_mels,
        fmin=f_min,
        fmax=f_max,
    )


    image = librosa.core.power_to_db(S)
    image_np = np.asmatrix(image)
    image_np_scaled_temp = image_np - np.min(image_np)
    image_np_scaled = image_np_scaled_temp / np.max(image_np_scaled_temp)
    mean = image.flatten().mean()
    std = image.flatten().std()
    eps = 1e-8
    spec_norm = (image - mean) / (std + eps)
    spec_min, spec_max = spec_norm.min(), spec_norm.max()
    spec_scaled = (spec_norm - spec_min) / (spec_max - spec_min)
    S1 = spec_scaled

    return S1

save_data_to_pickle

save_data_to_pickle(X, Y)

Save the input data and labels to pickle files.

This function saves the spectrogram data (X) and their corresponding labels (Y) into separate pickle files (X.pkl and Y.pkl) in the directory specified by self.saved_data_path.

Parameters:

Name Type Description Default
X any

The data to be saved (e.g., spectrograms). Must be pickle-serializable.

required
Y any

The corresponding labels for X. Must also be pickle-serializable.

required

Returns:

Type Description
None
Source code in eso/utils/preprocessing.py
def save_data_to_pickle(self, X, Y):
    """
    Save the input data and labels to pickle files.

    This function saves the spectrogram data (`X`) and their corresponding
    labels (`Y`) into separate pickle files (`X.pkl` and `Y.pkl`) in the directory 
    specified by `self.saved_data_path`.

    Parameters
    ----------
    X : any
        The data to be saved (e.g., spectrograms). Must be pickle-serializable.
    Y : any
        The corresponding labels for `X`. Must also be pickle-serializable.

    Returns
    -------
    None
    """
    outfile = open(Path(self.saved_data_path, "X.pkl"), "wb")
    pickle.dump(X, outfile, protocol=4)
    outfile.close()

    outfile = open(Path(self.saved_data_path, "Y.pkl"), "wb")
    pickle.dump(Y, outfile, protocol=4)
    outfile.close()

load_data_from_pickle

load_data_from_pickle()

Load the data and labels from pickle files.

This function loads spectrogram data (X) and their corresponding labels (Y) from pickle files (X.pkl and Y.pkl) located in the directory specified by self.saved_data_path.

Returns:

Name Type Description
X any

The loaded data (e.g., spectrograms), as previously saved using save_data_to_pickle.

Y any

The corresponding labels for X.

Source code in eso/utils/preprocessing.py
def load_data_from_pickle(self):
    """
    Load the data and labels from pickle files.

    This function loads spectrogram data (`X`) and their corresponding
    labels (`Y`) from pickle files (`X.pkl` and `Y.pkl`) located in the directory 
    specified by `self.saved_data_path`.

    Returns
    -------
    X : any
        The loaded data (e.g., spectrograms), as previously saved using `save_data_to_pickle`.
    Y : any
        The corresponding labels for `X`.
    """
    infile = open(Path(self.saved_data_path, "X.pkl"), "rb")
    X = pickle.load(infile)
    infile.close()

    infile = open(Path(self.saved_data_path, "Y.pkl"), "rb")
    Y = pickle.load(infile)
    infile.close()

    return X, Y

create_dataset

create_dataset(annotation_folder, sufix_file, file_names=None, augmentation=False)

Create the dataset of audio segments and labels for machine learning.

This function reads audio files and their corresponding annotation files, applies preprocessing (optional low-pass filtering and downsampling), extracts labeled audio segments, and optionally augments the data to balance class distributions.

Parameters:

Name Type Description Default
annotation_folder str or Path

Path to the folder containing the .svl annotation files.

required
sufix_file str

Suffix to append to the annotation filenames for retrieval.

required
file_names str or Path

Path to a CSV file containing a list of filenames to process (without extensions). If None, uses self.training_files.

None
augmentation bool

Whether to perform data augmentation to balance the dataset.

False

Returns:

Type Description
tuple of np.ndarray
  • X_calls : ndarray of shape (n_samples, ...) Array of preprocessed and optionally augmented audio segments, typically converted into spectrogram images.
  • Y_calls : ndarray of shape (n_samples,) Corresponding class labels for each segment (binary or multi-class).

Raises:

Type Description
ValueError

If the file_names CSV is missing or empty.

Notes
  • Annotations are expected in .svl format, created with Sonic Visualiser, using the "boxes area" annotation layer.
  • Each annotation provides a labeled time segment which is then transformed into a training example.
  • Augmentation methods include time shifting, noise addition, and mixing with negative samples to improve dataset balance.
Source code in eso/utils/preprocessing.py
def create_dataset(self, annotation_folder, sufix_file, file_names=None, augmentation=False):
    """
    Create the dataset of audio segments and labels for machine learning.

    This function reads audio files and their corresponding annotation files,
    applies preprocessing (optional low-pass filtering and downsampling),
    extracts labeled audio segments, and optionally augments the data to
    balance class distributions.

    Parameters
    ----------
    annotation_folder : str or Path
        Path to the folder containing the `.svl` annotation files.
    sufix_file : str
        Suffix to append to the annotation filenames for retrieval.
    file_names : str or Path, optional
        Path to a CSV file containing a list of filenames to process (without extensions).
        If None, uses `self.training_files`.
    augmentation : bool, optional
        Whether to perform data augmentation to balance the dataset.

    Returns
    -------
    tuple of np.ndarray
        - `X_calls` : ndarray of shape (n_samples, ...)
            Array of preprocessed and optionally augmented audio segments,
            typically converted into spectrogram images.
        - `Y_calls` : ndarray of shape (n_samples,)
            Corresponding class labels for each segment (binary or multi-class).

    Raises
    ------
    ValueError
        If the `file_names` CSV is missing or empty.

    Notes
    -----
    - Annotations are expected in `.svl` format, created with Sonic Visualiser,
    using the "boxes area" annotation layer.
    - Each annotation provides a labeled time segment which is then transformed
    into a training example.
    - Augmentation methods include time shifting, noise addition, and mixing
    with negative samples to improve dataset balance.
    """

    if file_names is None:
        file_names = self.training_files
    # Keep track of how many calls were found in the annotation files
    total_calls = 0

    # Initialise lists to store the X and Y values
    X_calls = []
    Y_calls = []

    # Read all names of the files
    try:
        files = pd.read_csv(file_names, header=None)
    except Exception:
        raise ValueError(
            f"Error loading filenames from {file_names}. Check if File is not empty."
        )
    # Iterate over each annotation file
    for file in files.values:
        file = file[0]

        file_name_no_extension = file

        reader = AnnotationReader(self.species_folder,file, self.file_type, self.audio_extension, self.positive_class
        )
        # Check if the audio file exists before processing
        if str(
            Path(self.audio_path, file_name_no_extension + self.audio_extension)
        ) in glob(str(self.audio_path / f"*{self.audio_extension}")):

            # Read audio file
            audio_amps, original_sample_rate = self.read_audio_file(
                str(
                    Path(
                        self.audio_path,
                        file_name_no_extension + self.audio_extension,
                    )
                )
            )

            if self.apply_preprocessing:
                # Low pass filter
                filtered = self.butter_lowpass_filter(
                    audio_amps, self.lowpass_cutoff, self.nyquist_rate
                )
                # Downsample
                amplitudes, sample_rate = self.downsample_file(
                    filtered, original_sample_rate, self.downsample_rate
                )
                del filtered

            else:

                if original_sample_rate!=self.sample_rate_unpreprocessed: 
                    amplitudes, sample_rate = self.downsample_file(
                    audio_amps, original_sample_rate, self.sample_rate_unpreprocessed
                )
                else :
                    amplitudes, sample_rate = audio_amps, original_sample_rate

            del audio_amps
            df, audio_file_name = reader.get_annotation_information(annotation_folder, sufix_file)


            for index, row in df.iterrows():
                start_seconds = int(round(row["Start"]))
                end_seconds = int(round(row["End"]))
                label = row["Label"]
                annotation_duration_seconds = end_seconds - start_seconds

                # Extract augmented audio segments and corresponding binary labels
                X_data, y_data = self._getXY(
                    amplitudes,
                    sample_rate,
                    start_seconds,
                    annotation_duration_seconds,
                    label
                )

                # Append the segments and labels
                X_calls.extend(X_data)
                Y_calls.extend(y_data)



    if augmentation:
        # Augment dataset to get a balance dataset
        X_calls, Y_calls = self._augment_dataset(X_calls, Y_calls)


    X_calls = self._convert_all_to_image(X_calls, sample_rate)

    # Convert to numpy arrays
    X_calls, Y_calls = np.asarray(X_calls), np.asarray(Y_calls)

    return X_calls, Y_calls

shuffle_files_names

shuffle_files_names(train_size=0.8, test_size=0.1, validation_size=0.1)

Shuffle audio file names and split them into training, testing, and validation sets.

This method scans the Audio folder inside the species directory for all files with the specified audio extension. It then randomly shuffles and splits the file names into training, testing, and validation sets according to the specified proportions. The resulting file names (without extensions) are saved as text files (train.txt, test.txt, validation.txt) inside the DataFiles subdirectory of the species folder.

Parameters:

Name Type Description Default
train_size float

Proportion of files to use for training. Default is 0.8.

0.8
test_size float

Proportion of files to use for testing. Default is 0.1.

0.1
validation_size float

Proportion of files to use for validation. Default is 0.1.

0.1

Raises:

Type Description
Exception

If no audio files are found in the specified audio directory.

Notes
  • The sum of train_size, test_size, and validation_size should be 1.0.
  • Output files are saved as plain text, with one file name (without extension) per line.
  • The audio extension is read from self.audio_extension, and the species folder from self.species_folder.
Source code in eso/utils/preprocessing.py
def shuffle_files_names(self, train_size=0.8, test_size=0.1, validation_size=0.1):
    """
    Shuffle audio file names and split them into training, testing, and validation sets.

    This method scans the `Audio` folder inside the species directory for all
    files with the specified audio extension. It then randomly shuffles and splits
    the file names into training, testing, and validation sets according to the 
    specified proportions. The resulting file names (without extensions) are saved
    as text files (`train.txt`, `test.txt`, `validation.txt`) inside the `DataFiles`
    subdirectory of the species folder.

    Parameters
    ----------
    train_size : float, optional
        Proportion of files to use for training. Default is 0.8.
    test_size : float, optional
        Proportion of files to use for testing. Default is 0.1.
    validation_size : float, optional
        Proportion of files to use for validation. Default is 0.1.

    Raises
    ------
    Exception
        If no audio files are found in the specified audio directory.

    Notes
    -----
    - The sum of `train_size`, `test_size`, and `validation_size` should be 1.0.
    - Output files are saved as plain text, with one file name (without extension) per line.
    - The audio extension is read from `self.audio_extension`, and the species folder
    from `self.species_folder`.
    """        
    # Get all file names in Audio folder
    path = Path(self.species_folder, "Audio", f"*{self.audio_extension}")
    files = glob(str(path))

    if len(files) == 0:
        raise Exception(
            f"No audio files found in {self.species_folder}/Audio.\
            Please check the audio_extension setting in the settings file."
        )
    # Shuffle the files
    np.random.shuffle(files)

    train_samples = int(np.floor(len(files) * train_size))
    test_samples = int(np.floor(len(files) * test_size))

    # Split the files into train, test, validation
    train_split = train_samples
    test_split = test_samples

    train_files = files[:train_split]
    test_files = files[train_split : train_split + test_split]
    # Use the rest for validation
    validation_files = files[train_split + test_split :]

    # Only get the file names
    train_files = [os.path.basename(file) for file in train_files]
    test_files = [os.path.basename(file) for file in test_files]
    validation_files = [os.path.basename(file) for file in validation_files]

    # Remove the file extension
    train_files = [os.path.splitext(file)[0] for file in train_files]
    test_files = [os.path.splitext(file)[0] for file in test_files]
    validation_files = [os.path.splitext(file)[0] for file in validation_files]

    # Create the folders
    os.makedirs(Path(self.species_folder, "DataFiles"), exist_ok=True)

    # Save the files as .txt
    with open(Path(self.species_folder, "DataFiles", "train.txt"), "w") as f:
        f.write("\n".join(train_files))
    with open(os.path.join(self.species_folder, "DataFiles", "test.txt"), "w") as f:
        f.write("\n".join(test_files))

    with open(Path(self.species_folder, "DataFiles", "validation.txt"), "w") as f:
        f.write("\n".join(validation_files))

check_distribution

check_distribution(Y)
Source code in eso/utils/preprocessing.py
def check_distribution(self, Y):
    unique, counts = np.unique(Y, return_counts=True)
    original_distribution = dict(zip(unique, counts))
    return original_distribution

eso.utils.AnnotationReader

Parses Sonic Visualiser SVL files and equivalent XML annotation formats into a DataFrame of (filename, start_time, end_time, label) rows. The output is consumed by Preprocessing to mark presence and absence segments for training.

AnnotationReader

AnnotationReader(
    path: str,
    annotation_file_name: str,
    file_type: str,
    audio_extension: str,
    positive_class: str,
)
Source code in eso/utils/AnnotationReader.py
def __init__(
    self, 
    path : str, 
    annotation_file_name : str, 
    file_type : str, 
    audio_extension : str, 
    positive_class: str):


    self.path = path
    self.annotation_file_name = annotation_file_name
    self.file_type = file_type
    self.audio_extension = audio_extension
    self.positive_class=positive_class
    """
    Initializes the AnnotationReader class.

    Parameters
    ----------
    path : str
        The path to the directory containing the annotation and audio files.
    annotation_file_name : str
        The name of the annotation file (without extension) to be read.
    file_type : str
        The type of annotation file (e.g., "svl", "xml").
    audio_extension : str
        The file extension for the associated audio files (e.g., ".wav", ".mp3").
    positive_class : str
        The label representing the positive class in classification tasks.

    Returns
    -------
    None
    """

path instance-attribute

path = path

annotation_file_name instance-attribute

annotation_file_name = annotation_file_name

file_type instance-attribute

file_type = file_type

audio_extension instance-attribute

audio_extension = audio_extension

positive_class instance-attribute

positive_class = positive_class

Initializes the AnnotationReader class.

Parameters:

Name Type Description Default
path str

The path to the directory containing the annotation and audio files.

required
annotation_file_name str

The name of the annotation file (without extension) to be read.

required
file_type str

The type of annotation file (e.g., "svl", "xml").

required
audio_extension str

The file extension for the associated audio files (e.g., ".wav", ".mp3").

required
positive_class str

The label representing the positive class in classification tasks.

required

Returns:

Type Description
None

get_annotation_information

get_annotation_information(annotation_folder, sufix_file)

Extract annotation information from an .svl XML file and return a DataFrame with start times, end times, and labels for the annotations.

This method parses an XML annotation file (.svl format) to extract annotation details including the start time, end time, and label for each annotation. It processes the XML file, handles any confidence values, and adjusts labels accordingly (e.g., using the positive class label for predicted annotations).

Parameters:

Name Type Description Default
annotation_folder str

The folder where the annotation file is located.

required
sufix_file str

The suffix to append to the base annotation file name to get the full file name.

required

Returns:

Type Description
tuple

A tuple containing: - pd.DataFrame: A DataFrame with three columns: - 'Start': The start time of the annotation in seconds. - 'End': The end time of the annotation in seconds. - 'Label': The label associated with the annotation. - str: The name of the corresponding audio file (with ".wav" extension).

Raises:

Type Description
Exception

If the annotation file does not contain valid annotation information.

Source code in eso/utils/AnnotationReader.py
def get_annotation_information(self, annotation_folder, sufix_file ):
    """
    Extract annotation information from an `.svl` XML file and return a DataFrame
    with start times, end times, and labels for the annotations.

    This method parses an XML annotation file (`.svl` format) to extract annotation
    details including the start time, end time, and label for each annotation.
    It processes the XML file, handles any confidence values, and adjusts labels
    accordingly (e.g., using the positive class label for predicted annotations).

    Parameters
    ----------
    annotation_folder : str
        The folder where the annotation file is located.
    sufix_file : str
        The suffix to append to the base annotation file name to get the full file name.

    Returns
    -------
    tuple
        A tuple containing:
        - pd.DataFrame: A DataFrame with three columns:
            - 'Start': The start time of the annotation in seconds.
            - 'End': The end time of the annotation in seconds.
            - 'Label': The label associated with the annotation.
        - str: The name of the corresponding audio file (with ".wav" extension).

    Raises
    ------
    Exception
        If the annotation file does not contain valid annotation information.
    """

    path = str(Path(
            self.path, annotation_folder, self.annotation_file_name + sufix_file
        ))


    xmldoc = minidom.parse(path)
    itemlist = xmldoc.getElementsByTagName("point")
    idlist = xmldoc.getElementsByTagName("model")

    start_time = []
    end_time = []
    labels = []
    audio_file_name = ""

    if len(idlist) > 0:
        for s in idlist: 
            original_sample_rate = int(s.attributes["sampleRate"].value)


    if len(itemlist) > 0:

        # Iterate over each annotation in the .svl file (annotatation file)
        for s in itemlist:
            # Get the starting seconds from the annotation file. Must be an integer
            # so that the correct frame from the waveform can be extracted
            start_seconds = (
                    float(s.attributes["frame"].value) / original_sample_rate
                )

            # Get the label from the annotation file
            label = str(s.attributes["label"].value)

            # Set the default confidence to 10 (i.e. high confidence that
            # the label is correct). Annotations that do not have the idea
            # of 'confidence' are teated like normal annotations and it is
            # assumed that the annotation is correct (by the annotator).
            label_confidence = 10

            # Check if a confidence has been assigned
            if "," in label:
                # Extract the raw label
                lalel_string = label[: label.find(",") :]

                # Extract confidence value
                label_confidence = int(label[label.find(",") + 1 :])

                # Set the label to the raw label
                label = lalel_string

                # If a file has a blank label then skip this annotation
                # to avoid mislabelling data
            if label == "":
                break


            #to include predictions obtained from a model
            if label == "predicted" :
                label=self.positive_class

            # Only considered cases where the labels are very confident
            # 10 = very confident, 5 = medium, 1 = unsure this is represented
            # as "SPECIES:10", "SPECIES:5" when annotating.
            if label_confidence == 10:
                # Get the duration from the annotation file
                annotation_duration_seconds = (
                        float(s.attributes["duration"].value) / original_sample_rate
                    )
                start_time.append(start_seconds)
                end_time.append(start_seconds + annotation_duration_seconds)
                labels.append(label)

    df_svl_gibbons = pd.DataFrame(
            {"Start": start_time, "End": end_time, "Label": labels}
        )
    return df_svl_gibbons, self.annotation_file_name + ".wav"

get_annotation_information_testing

get_annotation_information_testing()

Extract annotation information from a .svl XML file and return a DataFrame with frame, value, duration, extent, and label for each annotation.

This method parses an XML annotation file (.svl format) to extract detailed annotation information such as frame number, value, duration, extent, and label. It also extracts the sample rate, start time, and end time from the file's metadata.

Parameters:

Name Type Description Default
None
required

Returns:

Type Description
tuple

A tuple containing: - pd.DataFrame: A DataFrame with columns: - 'frame': The frame number from the annotation. - 'value': The value associated with the annotation. - 'duration': The duration of the annotation. - 'extent': The extent of the annotation. - 'label': The label associated with the annotation. - int: The sample rate extracted from the .svl file. - str: The start time of the annotation in the .svl file. - str: The end time of the annotation in the .svl file.

Raises:

Type Description
Exception

If the annotation file is not found or if it does not contain valid annotation information.

Source code in eso/utils/AnnotationReader.py
def get_annotation_information_testing(self):
    """
    Extract annotation information from a `.svl` XML file and return a DataFrame
    with frame, value, duration, extent, and label for each annotation.

    This method parses an XML annotation file (`.svl` format) to extract detailed
    annotation information such as frame number, value, duration, extent, and label.
    It also extracts the sample rate, start time, and end time from the file's metadata.

    Parameters
    ----------
    None

    Returns
    -------
    tuple
        A tuple containing:
        - pd.DataFrame: A DataFrame with columns:
            - 'frame': The frame number from the annotation.
            - 'value': The value associated with the annotation.
            - 'duration': The duration of the annotation.
            - 'extent': The extent of the annotation.
            - 'label': The label associated with the annotation.
        - int: The sample rate extracted from the `.svl` file.
        - str: The start time of the annotation in the `.svl` file.
        - str: The end time of the annotation in the `.svl` file.

    Raises
    ------
    Exception
        If the annotation file is not found or if it does not contain valid annotation information.
    """

    path = os.path.join(
            self.path, "Annotations", self.annotation_file_name + ".svl"
        )

    # Process the .svl xml file
    xmldoc = minidom.parse(path)
    itemlist = xmldoc.getElementsByTagName('point')
    idlist = xmldoc.getElementsByTagName('model')

    sampleRate = idlist.item(0).attributes['sampleRate'].value 
    start_m = idlist.item(0).attributes['start'].value
    end_m = idlist.item(0).attributes['end'].value


    values = []
    frames = []
    durations=[]
    extents=[]
    labels = []
    audio_file_name = ''

    if len(idlist) > 0:
        for s in idlist: 
            original_sample_rate = int(s.attributes["sampleRate"].value)

    if (len(itemlist) > 0):

    # Iterate over each annotation in the .svl file (annotatation file)
        for s in itemlist:

            # Get the starting seconds from the annotation file. Must be an integer
            # so that the correct frame from the waveform can be extracted
            frame = float(s.attributes['frame'].value)
            value = float(s.attributes['value'].value)
            duration = float(s.attributes['duration'].value)
            extent = float(s.attributes['extent'].value)
            label = str(s.attributes['label'].value)

            # Set the default confidence to 10 (i.e. high confidence that
            # the label is correct). Annotations that do not have the idea
            # of 'confidence' are teated like normal annotations and it is
            # assumed that the annotation is correct (by the annotator). 
            label_confidence = 10

            # Check if a confidence has been assigned
            if ',' in label:

                # Extract the raw label
                lalel_string = label[:label.find(','):]

                # Extract confidence value
                label_confidence = int(label[label.find(',')+1:])

                # Set the label to the raw label
                label = lalel_string


            # If a file has a blank label then skip this annotation
            # to avoid mislabelling data
            if label == '':
                break

            # Only considered cases where the labels are very confident
            # 10 = very confident, 5 = medium, 1 = unsure this is represented
            # as "SPECIES:10", "SPECIES:5" when annotating.
            if label_confidence == 10:

                frames.append(frame)
                values.append(value)
                durations.append(duration)
                extents.append(extent)
                labels.append(label)

    df_svl_gibbons = pd.DataFrame({'frame': frames, 'value':values ,'duration': durations,
                              'extent':extents,'label':labels})
    return df_svl_gibbons, sampleRate, start_m, end_m

dataframe_to_svl

dataframe_to_svl(dataframe, sample_rate, start_m, end_m)

Convert a DataFrame of annotations to a .svl format XML string.

This method generates a .svl format XML string containing the annotations from a DataFrame. The generated XML includes metadata such as the sample rate, start time, end time, and annotation points (frame, value, duration, extent, and label).

Parameters:

Name Type Description Default
dataframe DataFrame

A DataFrame containing the annotation information. The DataFrame should have the following columns: 'frame', 'value', 'duration', 'extent', and 'label'.

required
sample_rate int

The sample rate of the audio associated with the annotations.

required
start_m str

The start time (in seconds) of the annotation period.

required
end_m str

The end time (in seconds) of the annotation period.

required

Returns:

Type Description
str

A string containing the XML in .svl format, representing the annotations along with metadata.

Notes

The function generates an XML document that includes: - <model>: metadata about the annotation model, including sample rate, start time, and end time. - <dataset>: contains <point> elements that represent individual annotations. - <display>: defines the display settings for the annotation in the software.

Source code in eso/utils/AnnotationReader.py
def dataframe_to_svl(self, dataframe, sample_rate, start_m, end_m):
    """
    Convert a DataFrame of annotations to a `.svl` format XML string.

    This method generates a `.svl` format XML string containing the annotations
    from a DataFrame. The generated XML includes metadata such as the sample rate,
    start time, end time, and annotation points (frame, value, duration, extent, and label).

    Parameters
    ----------
    dataframe : pd.DataFrame
        A DataFrame containing the annotation information. The DataFrame should have 
        the following columns: 'frame', 'value', 'duration', 'extent', and 'label'.
    sample_rate : int
        The sample rate of the audio associated with the annotations.
    start_m : str
        The start time (in seconds) of the annotation period.
    end_m : str
        The end time (in seconds) of the annotation period.

    Returns
    -------
    str
        A string containing the XML in `.svl` format, representing the annotations
        along with metadata.

    Notes
    -----
    The function generates an XML document that includes:
    - `<model>`: metadata about the annotation model, including sample rate, start time, and end time.
    - `<dataset>`: contains `<point>` elements that represent individual annotations.
    - `<display>`: defines the display settings for the annotation in the software.
    """
    doc, tag, text = Doc().tagtext()
    doc.asis('<?xml version="1.0" encoding="UTF-8"?>')
    doc.asis('<!DOCTYPE sonic-visualiser>')

    with tag('sv'):
        with tag('data'):

            model_string = '<model id="10" name="" sampleRate="{}" start="{}" end="{}" type="sparse" dimensions="2" resolution="1" notifyOnAdd="true" dataset="9" subtype="box" minimum="600" maximum="{}" units="Hz" />'.format(sample_rate, 
                                                                    start_m,
                                                                    end_m,
                                                                    1000)
            doc.asis(model_string)

        with tag('dataset', id='9', dimensions='2'):

            # Read dataframe or other data structure and add the values here
            # These are added as "point" elements, for example:
            # '<point frame="15360" value="3136.87" duration="1724416" extent="2139.22" label="Cape Robin" />'
            for index, row in dataframe.iterrows():

                point  = '<point frame="{}" value="{}" duration="{}" extent="{}" label="{}" />'.format(
                    int(row['frame']), 
                    row['value'],
                    int(row['duration']),
                    1500,
                    row['label'])

                # add the point
                doc.asis(point)
        with tag('display'):

            display_string = '<layer id="2" type="boxes" name="Boxes" model="10"  verticalScale="0"  colourName="White" colour="#ffffff" darkBackground="true" />'
            doc.asis(display_string)

    result = indent(
        doc.getvalue(),
        indentation = ' '*2,
        newline = '\r\n'
    )

    return result

eso.utils.settings

The typed configuration schema. The JSON passed to ESO(settings_path=...) is validated against these dataclasses. Each top-level section of the file maps to one class. Unknown fields raise a ValueError at load time.

For a narrative walk-through of every field with recommended values from the paper, see Configuration.

BaseConfig dataclass

BaseConfig()

dict

dict()
Source code in eso/utils/settings.py
def dict(self):
    return asdict(self)

AlgorithmConfig dataclass

AlgorithmConfig(max_generations: int = 100)

Bases: BaseConfig

max_generations class-attribute instance-attribute

max_generations: int = 100

GeneticOperatorConfig dataclass

GeneticOperatorConfig(
    mutation_rate: float = 0.1,
    crossover_rate: float = 0.8,
    reproduction_rate: float = 0.1,
    mutation_height_range: int = 5,
    mutation_position_range: int = 20,
)

Bases: BaseConfig

mutation_rate class-attribute instance-attribute

mutation_rate: float = 0.1

crossover_rate class-attribute instance-attribute

crossover_rate: float = 0.8

reproduction_rate class-attribute instance-attribute

reproduction_rate: float = 0.1

mutation_height_range class-attribute instance-attribute

mutation_height_range: int = 5

mutation_position_range class-attribute instance-attribute

mutation_position_range: int = 20

SelectionOperatorConfig dataclass

SelectionOperatorConfig(tournament_size: int = 10)

Bases: BaseConfig

tournament_size class-attribute instance-attribute

tournament_size: int = 10

DataConfig dataclass

DataConfig(
    force_recreate_dataset: bool = False,
    keep_in_memory: bool = False,
    species_folder: str = "",
    train_size: float = 0.8,
    test_size: float = 0.2,
    reshuffle: bool = False,
    positive_class: str = "",
    negative_class: str = "",
)

Bases: BaseConfig

force_recreate_dataset class-attribute instance-attribute

force_recreate_dataset: bool = False

keep_in_memory class-attribute instance-attribute

keep_in_memory: bool = False

species_folder class-attribute instance-attribute

species_folder: str = ''

train_size class-attribute instance-attribute

train_size: float = 0.8

test_size class-attribute instance-attribute

test_size: float = 0.2

reshuffle class-attribute instance-attribute

reshuffle: bool = False

positive_class class-attribute instance-attribute

positive_class: str = ''

negative_class class-attribute instance-attribute

negative_class: str = ''

PreprocessingConfig dataclass

PreprocessingConfig(
    sample_rate: int = 32000,
    lowpass_cutoff: int = 2000,
    downsample_rate: int = 4800,
    nyquist_rate: int = 2400,
    segment_duration: int = 4,
    nb_negative_class: int = 20,
    file_type: str = "svl",
    audio_extension: str = ".wav",
    n_fft: int = 1024,
    hop_length: int = 256,
    n_mels: int = 128,
    f_min: int = 4000,
    f_max: int = 9000,
)

Bases: BaseConfig

sample_rate class-attribute instance-attribute

sample_rate: int = 32000

lowpass_cutoff class-attribute instance-attribute

lowpass_cutoff: int = 2000

downsample_rate class-attribute instance-attribute

downsample_rate: int = 4800

nyquist_rate class-attribute instance-attribute

nyquist_rate: int = 2400

segment_duration class-attribute instance-attribute

segment_duration: int = 4

nb_negative_class class-attribute instance-attribute

nb_negative_class: int = 20

file_type class-attribute instance-attribute

file_type: str = 'svl'

audio_extension class-attribute instance-attribute

audio_extension: str = '.wav'

n_fft class-attribute instance-attribute

n_fft: int = 1024

hop_length class-attribute instance-attribute

hop_length: int = 256

n_mels class-attribute instance-attribute

n_mels: int = 128

f_min class-attribute instance-attribute

f_min: int = 4000

f_max class-attribute instance-attribute

f_max: int = 9000

PopulationConfig dataclass

PopulationConfig(pop_size: int = 10)

Bases: BaseConfig

pop_size class-attribute instance-attribute

pop_size: int = 10

GeneConfig dataclass

GeneConfig(
    min_position: int = 0,
    max_position: int = -1,
    min_height: int = 4,
    max_height: int = 16,
    band_position: int = None,
    band_height: int = None,
    spec_height: int = None,
    minimum_gene_height: int = None,
)

Bases: BaseConfig

min_position class-attribute instance-attribute

min_position: int = 0

max_position class-attribute instance-attribute

max_position: int = -1

min_height class-attribute instance-attribute

min_height: int = 4

max_height class-attribute instance-attribute

max_height: int = 16

band_position class-attribute instance-attribute

band_position: int = None

band_height class-attribute instance-attribute

band_height: int = None

spec_height class-attribute instance-attribute

spec_height: int = None

minimum_gene_height class-attribute instance-attribute

minimum_gene_height: int = None

ChromosomeConfig dataclass

ChromosomeConfig(
    num_genes: int = None,
    min_num_genes: int = 3,
    max_num_genes: int = 10,
    lambda_1: float = 0.5,
    lambda_2: float = 0.5,
    stack: bool = False,
    baseline_parameters: float = None,
    baseline_metric: int = None,
)

Bases: BaseConfig

num_genes class-attribute instance-attribute

num_genes: int = None

min_num_genes class-attribute instance-attribute

min_num_genes: int = 3

max_num_genes class-attribute instance-attribute

max_num_genes: int = 10

lambda_1 class-attribute instance-attribute

lambda_1: float = 0.5

lambda_2 class-attribute instance-attribute

lambda_2: float = 0.5

stack class-attribute instance-attribute

stack: bool = False

baseline_parameters class-attribute instance-attribute

baseline_parameters: float = None

baseline_metric class-attribute instance-attribute

baseline_metric: int = None

ModelConfig dataclass

ModelConfig(
    optimizer_name: str = "adam",
    loss_function_name: str = "cross_entropy",
    num_epochs: int = 1,
    batch_size: int = 128,
    learning_rate: float = 0.001,
    shuffle: bool = True,
    metric: str = "f1",
)

Bases: BaseConfig

optimizer_name class-attribute instance-attribute

optimizer_name: str = 'adam'

loss_function_name class-attribute instance-attribute

loss_function_name: str = 'cross_entropy'

num_epochs class-attribute instance-attribute

num_epochs: int = 1

batch_size class-attribute instance-attribute

batch_size: int = 128

learning_rate class-attribute instance-attribute

learning_rate: float = 0.001

shuffle class-attribute instance-attribute

shuffle: bool = True

metric class-attribute instance-attribute

metric: str = 'f1'

ArchitectureConfig dataclass

ArchitectureConfig(
    conv_layers: int = 1,
    conv_filters: int = 8,
    dropout_rate: float = 0.5,
    conv_kernel: int = 8,
    max_pooling_size: int = 4,
    fc_units: int = 32,
    fc_layers: int = 2,
    conv_padding: str = None,
    stride_maxpool: int = None,
)

Bases: BaseConfig

conv_layers class-attribute instance-attribute

conv_layers: int = 1

conv_filters class-attribute instance-attribute

conv_filters: int = 8

dropout_rate class-attribute instance-attribute

dropout_rate: float = 0.5

conv_kernel class-attribute instance-attribute

conv_kernel: int = 8

max_pooling_size class-attribute instance-attribute

max_pooling_size: int = 4

fc_units class-attribute instance-attribute

fc_units: int = 32

fc_layers class-attribute instance-attribute

fc_layers: int = 2

conv_padding class-attribute instance-attribute

conv_padding: str = None

stride_maxpool class-attribute instance-attribute

stride_maxpool: int = None

Config dataclass

Config(
    _input: str = None,
    algorithm: AlgorithmConfig = AlgorithmConfig(),
    genetic_operator: GeneticOperatorConfig = GeneticOperatorConfig(),
    selection_operator: SelectionOperatorConfig = SelectionOperatorConfig(),
    data: DataConfig = DataConfig(),
    preprocessing: PreprocessingConfig = PreprocessingConfig(),
    population: PopulationConfig = PopulationConfig(),
    gene: GeneConfig = GeneConfig(),
    chromosome: ChromosomeConfig = ChromosomeConfig(),
    model: ModelConfig = ModelConfig(),
    cnn_architecture: ArchitectureConfig = ArchitectureConfig(),
)

Bases: BaseConfig

algorithm class-attribute instance-attribute

algorithm: AlgorithmConfig = field(default_factory=AlgorithmConfig)

genetic_operator class-attribute instance-attribute

genetic_operator: GeneticOperatorConfig = field(default_factory=GeneticOperatorConfig)

selection_operator class-attribute instance-attribute

selection_operator: SelectionOperatorConfig = field(
    default_factory=SelectionOperatorConfig
)

data class-attribute instance-attribute

data: DataConfig = field(default_factory=DataConfig)

preprocessing class-attribute instance-attribute

preprocessing: PreprocessingConfig = field(default_factory=PreprocessingConfig)

population class-attribute instance-attribute

population: PopulationConfig = field(default_factory=PopulationConfig)

gene class-attribute instance-attribute

gene: GeneConfig = field(default_factory=GeneConfig)

chromosome class-attribute instance-attribute

chromosome: ChromosomeConfig = field(default_factory=ChromosomeConfig)

model class-attribute instance-attribute

model: ModelConfig = field(default_factory=ModelConfig)

cnn_architecture class-attribute instance-attribute

cnn_architecture: ArchitectureConfig = field(default_factory=ArchitectureConfig)

get_params

get_params()
Source code in eso/utils/settings.py
def get_params(self):
    params = {}
    for key, value in asdict(self).items():
        if key == "_input":
            # params["settings"] = value
            continue
        for sub_key, sub_value in value.items():
            params[f"{key}_{sub_key}"] = sub_value
    return params

eso.utils.Evaluation

Reproduces the evaluation protocol described in the paper. The class slides a window over each test audio file, applies the model (baseline or ESO chromosome) per window, groups consecutive positive predictions into calling bouts, and computes true positives, false positives, false negatives, and true negatives using a 25 percent overlap rule (10 percent for the Thyolo Alethe dataset). It also measures FLOPs via fvcore, RAM usage via psutil, and energy via CodeCarbon.

Preprocessing

Preprocessing(
    species_folder: str,
    sample_rate: int,
    lowpass_cutoff: int,
    downsample_rate: int,
    nyquist_rate: int,
    segment_duration: int,
    positive_class: str,
    negative_class: str,
    nb_negative_class: int,
    n_fft: int,
    hop_length: int,
    n_mels: int,
    f_min: int,
    f_max: int,
    file_type: str,
    audio_extension: str,
    apply_preprocessing: bool = True,
)

Initialize the Preprocessing object.

Parameters:

Name Type Description Default
species_folder str

Path to the species folder containing audio and annotation data.

required
sample_rate int

The sample rate for unprocessed audio files.

required
lowpass_cutoff int

The cutoff frequency for the low-pass filter.

required
downsample_rate int

The rate at which to downsample the audio.

required
nyquist_rate int

The Nyquist rate, half of the sampling rate.

required
segment_duration int

Duration of each audio segment in seconds.

required
positive_class str

Label representing the positive class in the dataset.

required
negative_class str

Label representing the negative class in the dataset.

required
nb_negative_class int

Number of negative class samples.

required
n_fft int

The length of the FFT window for spectrograms.

required
hop_length int

The hop length for generating spectrograms.

required
n_mels int

The number of mel bands to use in the spectrogram.

required
f_min int

The minimum frequency for the mel filter bank.

required
f_max int

The maximum frequency for the mel filter bank.

required
file_type str

The type of annotation files to process (e.g., '.svl').

required
audio_extension str

The file extension for the audio files (e.g., '.wav').

required
apply_preprocessing bool

Whether to apply preprocessing steps like filtering and downsampling. Default is True.

True

Returns:

Type Description
None
Source code in eso/utils/preprocessing.py
def __init__(
    self,
    species_folder : str,
    sample_rate: int,
    lowpass_cutoff : int,
    downsample_rate : int,
    nyquist_rate : int,
    segment_duration : int,
    positive_class : str,
    negative_class : str,
    nb_negative_class : int,
    n_fft : int,
    hop_length : int,
    n_mels : int,
    f_min : int,
    f_max : int,
    file_type : str,
    audio_extension : str,
    apply_preprocessing: bool=True,

) -> None:
    """
    Initialize the Preprocessing object.

    Parameters
    ----------
    species_folder : str
        Path to the species folder containing audio and annotation data.
    sample_rate : int
        The sample rate for unprocessed audio files.
    lowpass_cutoff : int
        The cutoff frequency for the low-pass filter.
    downsample_rate : int
        The rate at which to downsample the audio.
    nyquist_rate : int
        The Nyquist rate, half of the sampling rate.
    segment_duration : int
        Duration of each audio segment in seconds.
    positive_class : str
        Label representing the positive class in the dataset.
    negative_class : str
        Label representing the negative class in the dataset.
    nb_negative_class : int
        Number of negative class samples.
    n_fft : int
        The length of the FFT window for spectrograms.
    hop_length : int
        The hop length for generating spectrograms.
    n_mels : int
        The number of mel bands to use in the spectrogram.
    f_min : int
        The minimum frequency for the mel filter bank.
    f_max : int
        The maximum frequency for the mel filter bank.
    file_type : str
        The type of annotation files to process (e.g., '.svl').
    audio_extension : str
        The file extension for the audio files (e.g., '.wav').
    apply_preprocessing : bool, optional
        Whether to apply preprocessing steps like filtering and downsampling. Default is True.

    Returns
    -------
    None
    """
    self.sample_rate_unpreprocessed=sample_rate
    self.species_folder = species_folder
    self.lowpass_cutoff = lowpass_cutoff
    self.downsample_rate = downsample_rate
    self.nyquist_rate = nyquist_rate
    self.segment_duration = segment_duration
    self.positive_class = positive_class
    self.negative_class = negative_class
    self.nb_negative_class = nb_negative_class
    self.audio_path = Path(self.species_folder, "Audio")
    self.annotations_path = Path(self.species_folder, "Annotations")
    self.saved_data_path = Path(self.species_folder, "SavedData")
    self.training_files = Path(self.species_folder, "DataFiles", "TrainingFiles.txt")      
    self.n_mels = n_mels
    self.f_min = f_min
    self.f_max = f_max
    self.file_type = file_type
    self.audio_extension = audio_extension
    self.apply_preprocessing = apply_preprocessing
    self.n_fft = n_fft
    self.hop_length = hop_length

sample_rate_unpreprocessed instance-attribute

sample_rate_unpreprocessed = sample_rate

species_folder instance-attribute

species_folder = species_folder

lowpass_cutoff instance-attribute

lowpass_cutoff = lowpass_cutoff

downsample_rate instance-attribute

downsample_rate = downsample_rate

nyquist_rate instance-attribute

nyquist_rate = nyquist_rate

segment_duration instance-attribute

segment_duration = segment_duration

positive_class instance-attribute

positive_class = positive_class

negative_class instance-attribute

negative_class = negative_class

nb_negative_class instance-attribute

nb_negative_class = nb_negative_class

audio_path instance-attribute

audio_path = Path(species_folder, 'Audio')

annotations_path instance-attribute

annotations_path = Path(species_folder, 'Annotations')

saved_data_path instance-attribute

saved_data_path = Path(species_folder, 'SavedData')

training_files instance-attribute

training_files = Path(species_folder, 'DataFiles', 'TrainingFiles.txt')

n_mels instance-attribute

n_mels = n_mels

f_min instance-attribute

f_min = f_min

f_max instance-attribute

f_max = f_max

file_type instance-attribute

file_type = file_type

audio_extension instance-attribute

audio_extension = audio_extension

apply_preprocessing instance-attribute

apply_preprocessing = apply_preprocessing

n_fft instance-attribute

n_fft = n_fft

hop_length instance-attribute

hop_length = hop_length

read_audio_file

read_audio_file(file_name)

Load an audio file and return its waveform and sample rate.

Parameters:

Name Type Description Default
file_name str

Name of the audio file including the extension (e.g., "audio1.wav").

required

Returns:

Type Description
tuple

A tuple containing: - np.ndarray: The audio waveform (amplitude values). - int: The sampling rate of the audio file.

Source code in eso/utils/preprocessing.py
def read_audio_file(self, file_name):
    """
    Load an audio file and return its waveform and sample rate.

    Parameters
    ----------
    file_name : str
        Name of the audio file including the extension (e.g., "audio1.wav").

    Returns
    -------
    tuple
        A tuple containing:
        - np.ndarray: The audio waveform (amplitude values).
        - int: The sampling rate of the audio file.
    """
    # Get the path to the file
    audio_folder = Path(file_name)

    # Read the amplitudes and sample rate
    audio_amps, audio_sample_rate = librosa.load(audio_folder, sr=None)

    return audio_amps, audio_sample_rate

butter_lowpass_filter

butter_lowpass_filter(data, cutoff_freq, nyq_freq, order=4)

Apply a Butterworth low-pass filter to the input signal.

This method filters the input signal using a zero-phase Butterworth low-pass filter designed with the specified cutoff and Nyquist frequencies.

Parameters:

Name Type Description Default
data ndarray

The input signal (1D array) to be filtered.

required
cutoff_freq float

The cutoff frequency of the low-pass filter (in Hz).

required
nyq_freq float

The Nyquist frequency (typically half the sampling rate).

required
order int

The order of the Butterworth filter. Default is 4.

4

Returns:

Type Description
ndarray

The filtered signal with the same shape as the input.

Source code in eso/utils/preprocessing.py
def butter_lowpass_filter(self, data, cutoff_freq, nyq_freq, order=4):
    """
    Apply a Butterworth low-pass filter to the input signal.

    This method filters the input signal using a zero-phase Butterworth low-pass
    filter designed with the specified cutoff and Nyquist frequencies.

    Parameters
    ----------
    data : np.ndarray
        The input signal (1D array) to be filtered.
    cutoff_freq : float
        The cutoff frequency of the low-pass filter (in Hz).
    nyq_freq : float
        The Nyquist frequency (typically half the sampling rate).
    order : int, optional
        The order of the Butterworth filter. Default is 4.

    Returns
    -------
    np.ndarray
        The filtered signal with the same shape as the input.
    """ 
    # Source: https://github.com/guillaume-chevalier/filtering-stft-and-laplace-transform
    b, a = self._butter_lowpass(cutoff_freq, nyq_freq, order=order)
    y = signal.filtfilt(b, a, data)
    return y

downsample_file

downsample_file(amplitudes, original_sr, new_sample_rate)

Downsample an audio waveform to a specified sample rate.

This function resamples the input audio from the original sample rate to a new, lower sample rate using the 'kaiser_fast' resampling method.

Parameters:

Name Type Description Default
amplitudes ndarray

The raw audio waveform (1D NumPy array of amplitude values).

required
original_sr int

The original sampling rate of the audio signal (in Hz).

required
new_sample_rate int

The desired sampling rate to downsample the audio to (in Hz).

required

Returns:

Type Description
tuple

A tuple containing: - np.ndarray: The downsampled audio waveform. - int: The new sampling rate (same as new_sample_rate).

Source code in eso/utils/preprocessing.py
def downsample_file(self, amplitudes, original_sr, new_sample_rate):
    """
    Downsample an audio waveform to a specified sample rate.

    This function resamples the input audio from the original sample rate
    to a new, lower sample rate using the 'kaiser_fast' resampling method.

    Parameters
    ----------
    amplitudes : np.ndarray
        The raw audio waveform (1D NumPy array of amplitude values).
    original_sr : int
        The original sampling rate of the audio signal (in Hz).
    new_sample_rate : int
        The desired sampling rate to downsample the audio to (in Hz).

    Returns
    -------
    tuple
        A tuple containing:
        - np.ndarray: The downsampled audio waveform.
        - int: The new sampling rate (same as `new_sample_rate`).
    """
    return (
        librosa.resample(
            amplitudes,
            orig_sr=original_sr,
            target_sr=new_sample_rate,
            res_type="kaiser_fast",
        ),
        new_sample_rate,
    )

convert_single_to_image

convert_single_to_image(audio, sample_rate)

Convert an audio waveform into a normalized mel-spectrogram image.

This function computes the mel-spectrogram from a raw audio signal and applies normalization to scale the spectrogram values between 0 and 1. If preprocessing is enabled, user-defined frequency limits are used; otherwise, default frequency bounds are applied.

Parameters:

Name Type Description Default
audio ndarray

The raw audio waveform (1D NumPy array of amplitude values).

required
sample_rate int

The sampling rate of the audio signal (in Hz).

required

Returns:

Type Description
ndarray

A 2D NumPy array representing the normalized mel-spectrogram image.

Source code in eso/utils/preprocessing.py
def convert_single_to_image(self, audio, sample_rate):
    """
    Convert an audio waveform into a normalized mel-spectrogram image.

    This function computes the mel-spectrogram from a raw audio signal and 
    applies normalization to scale the spectrogram values between 0 and 1.
    If preprocessing is enabled, user-defined frequency limits are used;
    otherwise, default frequency bounds are applied.

    Parameters
    ----------
    audio : np.ndarray
        The raw audio waveform (1D NumPy array of amplitude values).
    sample_rate : int
        The sampling rate of the audio signal (in Hz).

    Returns
    -------
    np.ndarray
        A 2D NumPy array representing the normalized mel-spectrogram image.
    """
    if not self.apply_preprocessing:
        f_min = 0
        f_max = 5000
    else:
        f_min = self.f_min
        f_max = self.f_max

    S = librosa.feature.melspectrogram(
        y=audio,
        sr=sample_rate,
        n_fft=self.n_fft,
        hop_length=self.hop_length,
        n_mels=self.n_mels,
        fmin=f_min,
        fmax=f_max,
    )


    image = librosa.core.power_to_db(S)
    image_np = np.asmatrix(image)
    image_np_scaled_temp = image_np - np.min(image_np)
    image_np_scaled = image_np_scaled_temp / np.max(image_np_scaled_temp)
    mean = image.flatten().mean()
    std = image.flatten().std()
    eps = 1e-8
    spec_norm = (image - mean) / (std + eps)
    spec_min, spec_max = spec_norm.min(), spec_norm.max()
    spec_scaled = (spec_norm - spec_min) / (spec_max - spec_min)
    S1 = spec_scaled

    return S1

save_data_to_pickle

save_data_to_pickle(X, Y)

Save the input data and labels to pickle files.

This function saves the spectrogram data (X) and their corresponding labels (Y) into separate pickle files (X.pkl and Y.pkl) in the directory specified by self.saved_data_path.

Parameters:

Name Type Description Default
X any

The data to be saved (e.g., spectrograms). Must be pickle-serializable.

required
Y any

The corresponding labels for X. Must also be pickle-serializable.

required

Returns:

Type Description
None
Source code in eso/utils/preprocessing.py
def save_data_to_pickle(self, X, Y):
    """
    Save the input data and labels to pickle files.

    This function saves the spectrogram data (`X`) and their corresponding
    labels (`Y`) into separate pickle files (`X.pkl` and `Y.pkl`) in the directory 
    specified by `self.saved_data_path`.

    Parameters
    ----------
    X : any
        The data to be saved (e.g., spectrograms). Must be pickle-serializable.
    Y : any
        The corresponding labels for `X`. Must also be pickle-serializable.

    Returns
    -------
    None
    """
    outfile = open(Path(self.saved_data_path, "X.pkl"), "wb")
    pickle.dump(X, outfile, protocol=4)
    outfile.close()

    outfile = open(Path(self.saved_data_path, "Y.pkl"), "wb")
    pickle.dump(Y, outfile, protocol=4)
    outfile.close()

load_data_from_pickle

load_data_from_pickle()

Load the data and labels from pickle files.

This function loads spectrogram data (X) and their corresponding labels (Y) from pickle files (X.pkl and Y.pkl) located in the directory specified by self.saved_data_path.

Returns:

Name Type Description
X any

The loaded data (e.g., spectrograms), as previously saved using save_data_to_pickle.

Y any

The corresponding labels for X.

Source code in eso/utils/preprocessing.py
def load_data_from_pickle(self):
    """
    Load the data and labels from pickle files.

    This function loads spectrogram data (`X`) and their corresponding
    labels (`Y`) from pickle files (`X.pkl` and `Y.pkl`) located in the directory 
    specified by `self.saved_data_path`.

    Returns
    -------
    X : any
        The loaded data (e.g., spectrograms), as previously saved using `save_data_to_pickle`.
    Y : any
        The corresponding labels for `X`.
    """
    infile = open(Path(self.saved_data_path, "X.pkl"), "rb")
    X = pickle.load(infile)
    infile.close()

    infile = open(Path(self.saved_data_path, "Y.pkl"), "rb")
    Y = pickle.load(infile)
    infile.close()

    return X, Y

create_dataset

create_dataset(annotation_folder, sufix_file, file_names=None, augmentation=False)

Create the dataset of audio segments and labels for machine learning.

This function reads audio files and their corresponding annotation files, applies preprocessing (optional low-pass filtering and downsampling), extracts labeled audio segments, and optionally augments the data to balance class distributions.

Parameters:

Name Type Description Default
annotation_folder str or Path

Path to the folder containing the .svl annotation files.

required
sufix_file str

Suffix to append to the annotation filenames for retrieval.

required
file_names str or Path

Path to a CSV file containing a list of filenames to process (without extensions). If None, uses self.training_files.

None
augmentation bool

Whether to perform data augmentation to balance the dataset.

False

Returns:

Type Description
tuple of np.ndarray
  • X_calls : ndarray of shape (n_samples, ...) Array of preprocessed and optionally augmented audio segments, typically converted into spectrogram images.
  • Y_calls : ndarray of shape (n_samples,) Corresponding class labels for each segment (binary or multi-class).

Raises:

Type Description
ValueError

If the file_names CSV is missing or empty.

Notes
  • Annotations are expected in .svl format, created with Sonic Visualiser, using the "boxes area" annotation layer.
  • Each annotation provides a labeled time segment which is then transformed into a training example.
  • Augmentation methods include time shifting, noise addition, and mixing with negative samples to improve dataset balance.
Source code in eso/utils/preprocessing.py
def create_dataset(self, annotation_folder, sufix_file, file_names=None, augmentation=False):
    """
    Create the dataset of audio segments and labels for machine learning.

    This function reads audio files and their corresponding annotation files,
    applies preprocessing (optional low-pass filtering and downsampling),
    extracts labeled audio segments, and optionally augments the data to
    balance class distributions.

    Parameters
    ----------
    annotation_folder : str or Path
        Path to the folder containing the `.svl` annotation files.
    sufix_file : str
        Suffix to append to the annotation filenames for retrieval.
    file_names : str or Path, optional
        Path to a CSV file containing a list of filenames to process (without extensions).
        If None, uses `self.training_files`.
    augmentation : bool, optional
        Whether to perform data augmentation to balance the dataset.

    Returns
    -------
    tuple of np.ndarray
        - `X_calls` : ndarray of shape (n_samples, ...)
            Array of preprocessed and optionally augmented audio segments,
            typically converted into spectrogram images.
        - `Y_calls` : ndarray of shape (n_samples,)
            Corresponding class labels for each segment (binary or multi-class).

    Raises
    ------
    ValueError
        If the `file_names` CSV is missing or empty.

    Notes
    -----
    - Annotations are expected in `.svl` format, created with Sonic Visualiser,
    using the "boxes area" annotation layer.
    - Each annotation provides a labeled time segment which is then transformed
    into a training example.
    - Augmentation methods include time shifting, noise addition, and mixing
    with negative samples to improve dataset balance.
    """

    if file_names is None:
        file_names = self.training_files
    # Keep track of how many calls were found in the annotation files
    total_calls = 0

    # Initialise lists to store the X and Y values
    X_calls = []
    Y_calls = []

    # Read all names of the files
    try:
        files = pd.read_csv(file_names, header=None)
    except Exception:
        raise ValueError(
            f"Error loading filenames from {file_names}. Check if File is not empty."
        )
    # Iterate over each annotation file
    for file in files.values:
        file = file[0]

        file_name_no_extension = file

        reader = AnnotationReader(self.species_folder,file, self.file_type, self.audio_extension, self.positive_class
        )
        # Check if the audio file exists before processing
        if str(
            Path(self.audio_path, file_name_no_extension + self.audio_extension)
        ) in glob(str(self.audio_path / f"*{self.audio_extension}")):

            # Read audio file
            audio_amps, original_sample_rate = self.read_audio_file(
                str(
                    Path(
                        self.audio_path,
                        file_name_no_extension + self.audio_extension,
                    )
                )
            )

            if self.apply_preprocessing:
                # Low pass filter
                filtered = self.butter_lowpass_filter(
                    audio_amps, self.lowpass_cutoff, self.nyquist_rate
                )
                # Downsample
                amplitudes, sample_rate = self.downsample_file(
                    filtered, original_sample_rate, self.downsample_rate
                )
                del filtered

            else:

                if original_sample_rate!=self.sample_rate_unpreprocessed: 
                    amplitudes, sample_rate = self.downsample_file(
                    audio_amps, original_sample_rate, self.sample_rate_unpreprocessed
                )
                else :
                    amplitudes, sample_rate = audio_amps, original_sample_rate

            del audio_amps
            df, audio_file_name = reader.get_annotation_information(annotation_folder, sufix_file)


            for index, row in df.iterrows():
                start_seconds = int(round(row["Start"]))
                end_seconds = int(round(row["End"]))
                label = row["Label"]
                annotation_duration_seconds = end_seconds - start_seconds

                # Extract augmented audio segments and corresponding binary labels
                X_data, y_data = self._getXY(
                    amplitudes,
                    sample_rate,
                    start_seconds,
                    annotation_duration_seconds,
                    label
                )

                # Append the segments and labels
                X_calls.extend(X_data)
                Y_calls.extend(y_data)



    if augmentation:
        # Augment dataset to get a balance dataset
        X_calls, Y_calls = self._augment_dataset(X_calls, Y_calls)


    X_calls = self._convert_all_to_image(X_calls, sample_rate)

    # Convert to numpy arrays
    X_calls, Y_calls = np.asarray(X_calls), np.asarray(Y_calls)

    return X_calls, Y_calls

shuffle_files_names

shuffle_files_names(train_size=0.8, test_size=0.1, validation_size=0.1)

Shuffle audio file names and split them into training, testing, and validation sets.

This method scans the Audio folder inside the species directory for all files with the specified audio extension. It then randomly shuffles and splits the file names into training, testing, and validation sets according to the specified proportions. The resulting file names (without extensions) are saved as text files (train.txt, test.txt, validation.txt) inside the DataFiles subdirectory of the species folder.

Parameters:

Name Type Description Default
train_size float

Proportion of files to use for training. Default is 0.8.

0.8
test_size float

Proportion of files to use for testing. Default is 0.1.

0.1
validation_size float

Proportion of files to use for validation. Default is 0.1.

0.1

Raises:

Type Description
Exception

If no audio files are found in the specified audio directory.

Notes
  • The sum of train_size, test_size, and validation_size should be 1.0.
  • Output files are saved as plain text, with one file name (without extension) per line.
  • The audio extension is read from self.audio_extension, and the species folder from self.species_folder.
Source code in eso/utils/preprocessing.py
def shuffle_files_names(self, train_size=0.8, test_size=0.1, validation_size=0.1):
    """
    Shuffle audio file names and split them into training, testing, and validation sets.

    This method scans the `Audio` folder inside the species directory for all
    files with the specified audio extension. It then randomly shuffles and splits
    the file names into training, testing, and validation sets according to the 
    specified proportions. The resulting file names (without extensions) are saved
    as text files (`train.txt`, `test.txt`, `validation.txt`) inside the `DataFiles`
    subdirectory of the species folder.

    Parameters
    ----------
    train_size : float, optional
        Proportion of files to use for training. Default is 0.8.
    test_size : float, optional
        Proportion of files to use for testing. Default is 0.1.
    validation_size : float, optional
        Proportion of files to use for validation. Default is 0.1.

    Raises
    ------
    Exception
        If no audio files are found in the specified audio directory.

    Notes
    -----
    - The sum of `train_size`, `test_size`, and `validation_size` should be 1.0.
    - Output files are saved as plain text, with one file name (without extension) per line.
    - The audio extension is read from `self.audio_extension`, and the species folder
    from `self.species_folder`.
    """        
    # Get all file names in Audio folder
    path = Path(self.species_folder, "Audio", f"*{self.audio_extension}")
    files = glob(str(path))

    if len(files) == 0:
        raise Exception(
            f"No audio files found in {self.species_folder}/Audio.\
            Please check the audio_extension setting in the settings file."
        )
    # Shuffle the files
    np.random.shuffle(files)

    train_samples = int(np.floor(len(files) * train_size))
    test_samples = int(np.floor(len(files) * test_size))

    # Split the files into train, test, validation
    train_split = train_samples
    test_split = test_samples

    train_files = files[:train_split]
    test_files = files[train_split : train_split + test_split]
    # Use the rest for validation
    validation_files = files[train_split + test_split :]

    # Only get the file names
    train_files = [os.path.basename(file) for file in train_files]
    test_files = [os.path.basename(file) for file in test_files]
    validation_files = [os.path.basename(file) for file in validation_files]

    # Remove the file extension
    train_files = [os.path.splitext(file)[0] for file in train_files]
    test_files = [os.path.splitext(file)[0] for file in test_files]
    validation_files = [os.path.splitext(file)[0] for file in validation_files]

    # Create the folders
    os.makedirs(Path(self.species_folder, "DataFiles"), exist_ok=True)

    # Save the files as .txt
    with open(Path(self.species_folder, "DataFiles", "train.txt"), "w") as f:
        f.write("\n".join(train_files))
    with open(os.path.join(self.species_folder, "DataFiles", "test.txt"), "w") as f:
        f.write("\n".join(test_files))

    with open(Path(self.species_folder, "DataFiles", "validation.txt"), "w") as f:
        f.write("\n".join(validation_files))

check_distribution

check_distribution(Y)
Source code in eso/utils/preprocessing.py
def check_distribution(self, Y):
    unique, counts = np.unique(Y, return_counts=True)
    original_distribution = dict(zip(unique, counts))
    return original_distribution

AnnotationReader

AnnotationReader(
    path: str,
    annotation_file_name: str,
    file_type: str,
    audio_extension: str,
    positive_class: str,
)
Source code in eso/utils/AnnotationReader.py
def __init__(
    self, 
    path : str, 
    annotation_file_name : str, 
    file_type : str, 
    audio_extension : str, 
    positive_class: str):


    self.path = path
    self.annotation_file_name = annotation_file_name
    self.file_type = file_type
    self.audio_extension = audio_extension
    self.positive_class=positive_class
    """
    Initializes the AnnotationReader class.

    Parameters
    ----------
    path : str
        The path to the directory containing the annotation and audio files.
    annotation_file_name : str
        The name of the annotation file (without extension) to be read.
    file_type : str
        The type of annotation file (e.g., "svl", "xml").
    audio_extension : str
        The file extension for the associated audio files (e.g., ".wav", ".mp3").
    positive_class : str
        The label representing the positive class in classification tasks.

    Returns
    -------
    None
    """

path instance-attribute

path = path

annotation_file_name instance-attribute

annotation_file_name = annotation_file_name

file_type instance-attribute

file_type = file_type

audio_extension instance-attribute

audio_extension = audio_extension

positive_class instance-attribute

positive_class = positive_class

Initializes the AnnotationReader class.

Parameters:

Name Type Description Default
path str

The path to the directory containing the annotation and audio files.

required
annotation_file_name str

The name of the annotation file (without extension) to be read.

required
file_type str

The type of annotation file (e.g., "svl", "xml").

required
audio_extension str

The file extension for the associated audio files (e.g., ".wav", ".mp3").

required
positive_class str

The label representing the positive class in classification tasks.

required

Returns:

Type Description
None

get_annotation_information

get_annotation_information(annotation_folder, sufix_file)

Extract annotation information from an .svl XML file and return a DataFrame with start times, end times, and labels for the annotations.

This method parses an XML annotation file (.svl format) to extract annotation details including the start time, end time, and label for each annotation. It processes the XML file, handles any confidence values, and adjusts labels accordingly (e.g., using the positive class label for predicted annotations).

Parameters:

Name Type Description Default
annotation_folder str

The folder where the annotation file is located.

required
sufix_file str

The suffix to append to the base annotation file name to get the full file name.

required

Returns:

Type Description
tuple

A tuple containing: - pd.DataFrame: A DataFrame with three columns: - 'Start': The start time of the annotation in seconds. - 'End': The end time of the annotation in seconds. - 'Label': The label associated with the annotation. - str: The name of the corresponding audio file (with ".wav" extension).

Raises:

Type Description
Exception

If the annotation file does not contain valid annotation information.

Source code in eso/utils/AnnotationReader.py
def get_annotation_information(self, annotation_folder, sufix_file ):
    """
    Extract annotation information from an `.svl` XML file and return a DataFrame
    with start times, end times, and labels for the annotations.

    This method parses an XML annotation file (`.svl` format) to extract annotation
    details including the start time, end time, and label for each annotation.
    It processes the XML file, handles any confidence values, and adjusts labels
    accordingly (e.g., using the positive class label for predicted annotations).

    Parameters
    ----------
    annotation_folder : str
        The folder where the annotation file is located.
    sufix_file : str
        The suffix to append to the base annotation file name to get the full file name.

    Returns
    -------
    tuple
        A tuple containing:
        - pd.DataFrame: A DataFrame with three columns:
            - 'Start': The start time of the annotation in seconds.
            - 'End': The end time of the annotation in seconds.
            - 'Label': The label associated with the annotation.
        - str: The name of the corresponding audio file (with ".wav" extension).

    Raises
    ------
    Exception
        If the annotation file does not contain valid annotation information.
    """

    path = str(Path(
            self.path, annotation_folder, self.annotation_file_name + sufix_file
        ))


    xmldoc = minidom.parse(path)
    itemlist = xmldoc.getElementsByTagName("point")
    idlist = xmldoc.getElementsByTagName("model")

    start_time = []
    end_time = []
    labels = []
    audio_file_name = ""

    if len(idlist) > 0:
        for s in idlist: 
            original_sample_rate = int(s.attributes["sampleRate"].value)


    if len(itemlist) > 0:

        # Iterate over each annotation in the .svl file (annotatation file)
        for s in itemlist:
            # Get the starting seconds from the annotation file. Must be an integer
            # so that the correct frame from the waveform can be extracted
            start_seconds = (
                    float(s.attributes["frame"].value) / original_sample_rate
                )

            # Get the label from the annotation file
            label = str(s.attributes["label"].value)

            # Set the default confidence to 10 (i.e. high confidence that
            # the label is correct). Annotations that do not have the idea
            # of 'confidence' are teated like normal annotations and it is
            # assumed that the annotation is correct (by the annotator).
            label_confidence = 10

            # Check if a confidence has been assigned
            if "," in label:
                # Extract the raw label
                lalel_string = label[: label.find(",") :]

                # Extract confidence value
                label_confidence = int(label[label.find(",") + 1 :])

                # Set the label to the raw label
                label = lalel_string

                # If a file has a blank label then skip this annotation
                # to avoid mislabelling data
            if label == "":
                break


            #to include predictions obtained from a model
            if label == "predicted" :
                label=self.positive_class

            # Only considered cases where the labels are very confident
            # 10 = very confident, 5 = medium, 1 = unsure this is represented
            # as "SPECIES:10", "SPECIES:5" when annotating.
            if label_confidence == 10:
                # Get the duration from the annotation file
                annotation_duration_seconds = (
                        float(s.attributes["duration"].value) / original_sample_rate
                    )
                start_time.append(start_seconds)
                end_time.append(start_seconds + annotation_duration_seconds)
                labels.append(label)

    df_svl_gibbons = pd.DataFrame(
            {"Start": start_time, "End": end_time, "Label": labels}
        )
    return df_svl_gibbons, self.annotation_file_name + ".wav"

get_annotation_information_testing

get_annotation_information_testing()

Extract annotation information from a .svl XML file and return a DataFrame with frame, value, duration, extent, and label for each annotation.

This method parses an XML annotation file (.svl format) to extract detailed annotation information such as frame number, value, duration, extent, and label. It also extracts the sample rate, start time, and end time from the file's metadata.

Parameters:

Name Type Description Default
None
required

Returns:

Type Description
tuple

A tuple containing: - pd.DataFrame: A DataFrame with columns: - 'frame': The frame number from the annotation. - 'value': The value associated with the annotation. - 'duration': The duration of the annotation. - 'extent': The extent of the annotation. - 'label': The label associated with the annotation. - int: The sample rate extracted from the .svl file. - str: The start time of the annotation in the .svl file. - str: The end time of the annotation in the .svl file.

Raises:

Type Description
Exception

If the annotation file is not found or if it does not contain valid annotation information.

Source code in eso/utils/AnnotationReader.py
def get_annotation_information_testing(self):
    """
    Extract annotation information from a `.svl` XML file and return a DataFrame
    with frame, value, duration, extent, and label for each annotation.

    This method parses an XML annotation file (`.svl` format) to extract detailed
    annotation information such as frame number, value, duration, extent, and label.
    It also extracts the sample rate, start time, and end time from the file's metadata.

    Parameters
    ----------
    None

    Returns
    -------
    tuple
        A tuple containing:
        - pd.DataFrame: A DataFrame with columns:
            - 'frame': The frame number from the annotation.
            - 'value': The value associated with the annotation.
            - 'duration': The duration of the annotation.
            - 'extent': The extent of the annotation.
            - 'label': The label associated with the annotation.
        - int: The sample rate extracted from the `.svl` file.
        - str: The start time of the annotation in the `.svl` file.
        - str: The end time of the annotation in the `.svl` file.

    Raises
    ------
    Exception
        If the annotation file is not found or if it does not contain valid annotation information.
    """

    path = os.path.join(
            self.path, "Annotations", self.annotation_file_name + ".svl"
        )

    # Process the .svl xml file
    xmldoc = minidom.parse(path)
    itemlist = xmldoc.getElementsByTagName('point')
    idlist = xmldoc.getElementsByTagName('model')

    sampleRate = idlist.item(0).attributes['sampleRate'].value 
    start_m = idlist.item(0).attributes['start'].value
    end_m = idlist.item(0).attributes['end'].value


    values = []
    frames = []
    durations=[]
    extents=[]
    labels = []
    audio_file_name = ''

    if len(idlist) > 0:
        for s in idlist: 
            original_sample_rate = int(s.attributes["sampleRate"].value)

    if (len(itemlist) > 0):

    # Iterate over each annotation in the .svl file (annotatation file)
        for s in itemlist:

            # Get the starting seconds from the annotation file. Must be an integer
            # so that the correct frame from the waveform can be extracted
            frame = float(s.attributes['frame'].value)
            value = float(s.attributes['value'].value)
            duration = float(s.attributes['duration'].value)
            extent = float(s.attributes['extent'].value)
            label = str(s.attributes['label'].value)

            # Set the default confidence to 10 (i.e. high confidence that
            # the label is correct). Annotations that do not have the idea
            # of 'confidence' are teated like normal annotations and it is
            # assumed that the annotation is correct (by the annotator). 
            label_confidence = 10

            # Check if a confidence has been assigned
            if ',' in label:

                # Extract the raw label
                lalel_string = label[:label.find(','):]

                # Extract confidence value
                label_confidence = int(label[label.find(',')+1:])

                # Set the label to the raw label
                label = lalel_string


            # If a file has a blank label then skip this annotation
            # to avoid mislabelling data
            if label == '':
                break

            # Only considered cases where the labels are very confident
            # 10 = very confident, 5 = medium, 1 = unsure this is represented
            # as "SPECIES:10", "SPECIES:5" when annotating.
            if label_confidence == 10:

                frames.append(frame)
                values.append(value)
                durations.append(duration)
                extents.append(extent)
                labels.append(label)

    df_svl_gibbons = pd.DataFrame({'frame': frames, 'value':values ,'duration': durations,
                              'extent':extents,'label':labels})
    return df_svl_gibbons, sampleRate, start_m, end_m

dataframe_to_svl

dataframe_to_svl(dataframe, sample_rate, start_m, end_m)

Convert a DataFrame of annotations to a .svl format XML string.

This method generates a .svl format XML string containing the annotations from a DataFrame. The generated XML includes metadata such as the sample rate, start time, end time, and annotation points (frame, value, duration, extent, and label).

Parameters:

Name Type Description Default
dataframe DataFrame

A DataFrame containing the annotation information. The DataFrame should have the following columns: 'frame', 'value', 'duration', 'extent', and 'label'.

required
sample_rate int

The sample rate of the audio associated with the annotations.

required
start_m str

The start time (in seconds) of the annotation period.

required
end_m str

The end time (in seconds) of the annotation period.

required

Returns:

Type Description
str

A string containing the XML in .svl format, representing the annotations along with metadata.

Notes

The function generates an XML document that includes: - <model>: metadata about the annotation model, including sample rate, start time, and end time. - <dataset>: contains <point> elements that represent individual annotations. - <display>: defines the display settings for the annotation in the software.

Source code in eso/utils/AnnotationReader.py
def dataframe_to_svl(self, dataframe, sample_rate, start_m, end_m):
    """
    Convert a DataFrame of annotations to a `.svl` format XML string.

    This method generates a `.svl` format XML string containing the annotations
    from a DataFrame. The generated XML includes metadata such as the sample rate,
    start time, end time, and annotation points (frame, value, duration, extent, and label).

    Parameters
    ----------
    dataframe : pd.DataFrame
        A DataFrame containing the annotation information. The DataFrame should have 
        the following columns: 'frame', 'value', 'duration', 'extent', and 'label'.
    sample_rate : int
        The sample rate of the audio associated with the annotations.
    start_m : str
        The start time (in seconds) of the annotation period.
    end_m : str
        The end time (in seconds) of the annotation period.

    Returns
    -------
    str
        A string containing the XML in `.svl` format, representing the annotations
        along with metadata.

    Notes
    -----
    The function generates an XML document that includes:
    - `<model>`: metadata about the annotation model, including sample rate, start time, and end time.
    - `<dataset>`: contains `<point>` elements that represent individual annotations.
    - `<display>`: defines the display settings for the annotation in the software.
    """
    doc, tag, text = Doc().tagtext()
    doc.asis('<?xml version="1.0" encoding="UTF-8"?>')
    doc.asis('<!DOCTYPE sonic-visualiser>')

    with tag('sv'):
        with tag('data'):

            model_string = '<model id="10" name="" sampleRate="{}" start="{}" end="{}" type="sparse" dimensions="2" resolution="1" notifyOnAdd="true" dataset="9" subtype="box" minimum="600" maximum="{}" units="Hz" />'.format(sample_rate, 
                                                                    start_m,
                                                                    end_m,
                                                                    1000)
            doc.asis(model_string)

        with tag('dataset', id='9', dimensions='2'):

            # Read dataframe or other data structure and add the values here
            # These are added as "point" elements, for example:
            # '<point frame="15360" value="3136.87" duration="1724416" extent="2139.22" label="Cape Robin" />'
            for index, row in dataframe.iterrows():

                point  = '<point frame="{}" value="{}" duration="{}" extent="{}" label="{}" />'.format(
                    int(row['frame']), 
                    row['value'],
                    int(row['duration']),
                    1500,
                    row['label'])

                # add the point
                doc.asis(point)
        with tag('display'):

            display_string = '<layer id="2" type="boxes" name="Boxes" model="10"  verticalScale="0"  colourName="White" colour="#ffffff" darkBackground="true" />'
            doc.asis(display_string)

    result = indent(
        doc.getvalue(),
        indentation = ' '*2,
        newline = '\r\n'
    )

    return result

CPU_Unpickler

Bases: Unpickler

find_class

find_class(module, name)
Source code in eso/utils/Evaluation.py
def find_class(self, module, name):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    if module == "torch.storage" and name == "_load_from_bytes":
        return lambda b: torch.load(io.BytesIO(b), map_location=device)
    else:
        return super().find_class(module, name)

Evaluation

Evaluation(
    species_folder: str,
    settings,
    overlap=0.25,
    nb_to_group=2,
    threshold=0.8,
    chromosome=None,
    apply_preprocessing: bool = True,
    force_calc_spectrograms: bool = False,
    logger=None,
    log_path=None,
    log_level=0,
    save_folder: str = "Predictions",
)
Source code in eso/utils/Evaluation.py
def __init__(
    self,
    species_folder: str,
    settings,
    overlap=0.25,
    nb_to_group=2,
    threshold=0.8,
    chromosome=None,
    apply_preprocessing: bool = True,
    force_calc_spectrograms: bool = False,
    logger=None,
    log_path=None,
    log_level=0,
    save_folder: str = "Predictions",
) -> None:


    if logger==None : 
        self.logger = setup_logger(
            logger=logger, log_path=log_path, log_level=log_level) 
    else : 
        self.logger=logger


    self.species_folder = species_folder
    __preprocessing_name = "preprocessed" if apply_preprocessing else "unpreprocessed"
    self.saved_data_folder = Path(species_folder, "SavedData", __preprocessing_name)


    self.apply_preprocessing_flag = apply_preprocessing
    self.config = settings
    self.segment_duration=self.config.preprocessing.dict()["segment_duration"]
    self.positive_class = self.config.data.dict()["positive_class"]
    self.negative_class = self.config.data.dict()["negative_class"]

    self.overlap=overlap
    self.nb_to_group=nb_to_group
    self.threshold=threshold
    self.sampling_rate_origin=self.config.preprocessing.sample_rate

    self.chromosome = chromosome
    self.force_calc_spectrograms = force_calc_spectrograms


    if self.chromosome == None:
        self.save_folder_predictions = save_folder + "_baseline"
        self.save_folder_spectrograms =  "Saved_spectrograms_baseline"
    else:
        self.save_folder_predictions = save_folder + "_chromosome"
        self.save_folder_spectrograms =  "Saved_spectrograms_chromosome"

    self.save_results = Path(self.species_folder, self.save_folder_predictions)
    self.save_spectrograms_path=Path(self.species_folder, self.save_folder_spectrograms)

    self.prep = Preprocessing(
        **self.config.preprocessing.dict(),
        positive_class=self.positive_class,
        negative_class=self.negative_class,
        apply_preprocessing=self.apply_preprocessing_flag,
        species_folder=self.species_folder,
    )

logger instance-attribute

logger = setup_logger(logger=logger, log_path=log_path, log_level=log_level)

species_folder instance-attribute

species_folder = species_folder

saved_data_folder instance-attribute

saved_data_folder = Path(species_folder, 'SavedData', __preprocessing_name)

apply_preprocessing_flag instance-attribute

apply_preprocessing_flag = apply_preprocessing

config instance-attribute

config = settings

segment_duration instance-attribute

segment_duration = dict()['segment_duration']

positive_class instance-attribute

positive_class = dict()['positive_class']

negative_class instance-attribute

negative_class = dict()['negative_class']

overlap instance-attribute

overlap = overlap

nb_to_group instance-attribute

nb_to_group = nb_to_group

threshold instance-attribute

threshold = threshold

sampling_rate_origin instance-attribute

sampling_rate_origin = sample_rate

chromosome instance-attribute

chromosome = chromosome

force_calc_spectrograms instance-attribute

force_calc_spectrograms = force_calc_spectrograms

save_folder_predictions instance-attribute

save_folder_predictions = save_folder + '_baseline'

save_folder_spectrograms instance-attribute

save_folder_spectrograms = 'Saved_spectrograms_baseline'

save_results instance-attribute

save_results = Path(species_folder, save_folder_predictions)

save_spectrograms_path instance-attribute

save_spectrograms_path = Path(species_folder, save_folder_spectrograms)

prep instance-attribute

prep = Preprocessing(
    **(dict()),
    positive_class=positive_class,
    negative_class=negative_class,
    apply_preprocessing=apply_preprocessing_flag,
    species_folder=species_folder
)

prediction_files

prediction_files(model, data_type='test')
Source code in eso/utils/Evaluation.py
def prediction_files(self, model, data_type = "test"):


    test_path = Path(self.species_folder, "DataFiles", data_type + ".txt")
    #test_path = Path(self.species_folder, "DataFiles", "test.txt")
    file_names = pd.read_csv(test_path, header=None)

    for file in file_names.values:
        file = file[0]

        self.logger.info(f"Processing file: {file}")
        self._process_one_file(file, model, verbose=True)

comparison_predictions_annotations

comparison_predictions_annotations(folder, data_type='test')
Source code in eso/utils/Evaluation.py
def comparison_predictions_annotations(self, folder, data_type="test"):
    self.logger.info("comparing prediction and annotation")
    test_path = Path(self.species_folder, "DataFiles", data_type + ".txt")
    #test_path = Path(self.species_folder, "DataFiles", "test.txt")
    file_names = pd.read_csv(test_path, header=None)



    predictions = []
    annotations = []

    # check if corrected annotations for the testing files have been done
    if os.path.exists(Path(self.species_folder, "Annotations_corrected")):
        self.logger.info(
            "the corrected annotations of the testing dataset have already been created "
        )

    else:
        self.logger.info(
            "Need to modify the annotations of the testing dataset to allow a correct evaluation of the model "
        )
        self._repair_svl(
            file_names,
            self.prep.file_type,
            self.prep.audio_extension,
            annotation_folder="Annotations",
            sufix_file=".svl",
        )
    for file in file_names.values:
        file = file[0]

        reader = AnnotationReader(
            self.species_folder,
            file,
            self.prep.file_type,
            self.prep.audio_extension,
            self.positive_class,
        )

        svl = reader.get_annotation_information(
            annotation_folder="Annotations_corrected", sufix_file="_repaired.svl"
        )[0]
        svl["Overlap"] = 0.0
        svl["Cat"] = "TN"
        svl.loc[svl.Label == self.positive_class, "Cat"] = "FN"
        svl["Index"] = np.nan
        svl["Nb overlap"] = 0
        svl["Name"] = file

        if os.path.exists(
            Path(self.species_folder, folder, file + "_predictions.svl")
        ):
            self.logger.info(f"Found Prediction: {file} ")
            predict = reader.get_annotation_information(
                annotation_folder=folder, sufix_file="_predictions.svl"
            )[0]

            predict["Overlap"] = 0.0
            predict["Cat"] = "FP"
            predict["Index"] = np.nan
            predict["Nb overlap"] = 0
            predict["Name"] = file

            # compare predictions vs annotations
            if svl[svl.Label == self.positive_class].shape[0] != 0:
                for index, row in predict.iterrows():
                    idx = np.abs(
                        np.asarray(
                            svl[svl.Label == self.positive_class]["Start"]
                        )
                        - row.iloc[0]
                    ).argmin()  # get the closest window
                    lap = self._overlap(
                        row.iloc[0],
                        row.iloc[1],
                        svl[svl.Label == self.positive_class].iloc[idx, 0],
                        svl[svl.Label == self.positive_class].iloc[idx, 1],
                    )  # check overlap

                    if lap > self.overlap * self.segment_duration :
                        predict.loc[index, "Overlap"] = deepcopy(lap)
                        predict.loc[index, "Cat"] = "TP"
                        predict.loc[index, "Index"] = idx
                    else:
                        predict.loc[index, "Overlap"] = deepcopy(lap)

                for index, row in predict.iterrows():
                    w = 0
                    for idx_svl, row_svl in svl[
                        svl.Label == self.positive_class
                    ].iterrows():
                        lap = self._overlap(
                            row.iloc[0],
                            row.iloc[1],
                            row_svl.iloc[0],
                            row_svl.iloc[1],
                        )
                        if lap > self.overlap * self.segment_duration :
                            w += 1
                    predict.loc[index, "Nb overlap"] = w
            else:
                self.logger.info("No positive class in the annotation file")
            predictions.append(predict)

            # compare annotations vs predictions
            for index, row in svl.iterrows():
                idx = np.abs(
                    np.asarray(predict["Start"]) - row.iloc[0]
                ).argmin()  # get the closest window
                lap = self._overlap(
                    row.iloc[0],
                    row.iloc[1],
                    predict.iloc[idx, 0],
                    predict.iloc[idx, 1],
                )  # check overlap

                if (lap > self.overlap * self.segment_duration) & (
                    svl.loc[index, "Label"] == self.positive_class
                ):
                    svl.loc[index, "Overlap"] = deepcopy(lap)
                    svl.loc[index, "Index"] = idx
                    svl.loc[index, "Cat"] = "TP"
                elif (lap > self.overlap * self.segment_duration) & (
                    svl.loc[index, "Label"] == self.negative_class
                ):
                    svl.loc[index, "Overlap"] = deepcopy(lap)
                    svl.loc[index, "Index"] = idx
                    svl.loc[index, "Cat"] = "FP"
                else:
                    svl.loc[index, "Overlap"] = deepcopy(lap)

            # Print File and FP TP FN
            self.logger.info("-------------")
            self.logger.info(file)
            self.logger.info(f"FP : {predict[predict.Cat == 'FP'].shape[0]}")
            self.logger.info(f"TP : {svl[svl.Cat == 'TP'].shape[0]} ")
            self.logger.info(f"FN : {svl[svl.Cat == 'FN'].shape[0]} ")
            self.logger.info("-------------")

            for index, row in svl.iterrows():
                w = 0
                for idx_pred, row_pred in predict.iterrows():
                    lap = self._overlap(
                        row.iloc[0], row.iloc[1], row_pred.iloc[0], row_pred.iloc[1]
                    )
                    if lap > self.overlap * self.segment_duration:
                        w += 1
                svl.loc[index, "Nb overlap"] = w

        annotations.append(svl)

    Predictions = pd.DataFrame(np.concatenate(predictions, axis=0))
    Predictions.columns = predict.columns
    Predictions.Index = Predictions.Index.astype(float)

    Annotations = pd.DataFrame(np.concatenate(annotations, axis=0))
    Annotations.columns = svl.columns
    Annotations.Index = Annotations.Index.astype(float)

    return Predictions, Annotations

testing_score

testing_score(Annotations, Predictions)
Source code in eso/utils/Evaluation.py
def testing_score(self, Annotations, Predictions):


    cat, count = np.unique(Predictions["Cat"], return_counts=True)
    cat_a, count_a = np.unique(Annotations["Cat"], return_counts=True)


    FP = count[cat == "FP"][0] if len(count[cat == "FP"]) > 0 else 0
    TP = count_a[cat_a == "TP"][0] if len(count_a[cat_a == "TP"]) > 0 else 0
    FN = count_a[cat_a == "FN"][0] if len(count_a[cat_a == "FN"]) > 0 else 0
    TN = count_a[cat_a == "TN"][0] if len(count_a[cat_a == "TN"]) > 0 else 0

    F_score = TP / (TP + ((FN + FP) / 2))
    Accuracy = (TP + TN) / (TP + TN + FP + FN)
    confusion=np.array([[TP, FP], [FN, TN]])

    self.logger.info(
        f"Number of calls to detect :{Annotations[Annotations.Label == self.positive_class].shape[0]}")
    self.logger.info(f"False Positif :  {FP}")
    self.logger.info(f"True Positif :{TP} ")
    self.logger.info(f"False Negatif : {FN}" )
    self.logger.info(f"F1-score : {F_score}")
    self.logger.info(f"Accuracy : {Accuracy}")

    return F_score, Accuracy, confusion

run

run(model, data_type='test', test_type='simple')
Source code in eso/utils/Evaluation.py
def run(self, model, data_type="test", test_type = "simple"):
    if test_type == "simple":
        return self._presegmented_dataset_run(model, data_type=data_type)

    else : 
        return self._entire_files_run(model, data_type=data_type)

plot_chromosome

plot_chromosome(
    chromosome, image_height, title, results_path=None, name="current_best_chromosome"
)
Source code in eso/utils/logger.py
def plot_chromosome(
    chromosome, image_height, title, results_path=None, name="current_best_chromosome"
):
    plt.figure(figsize=(4.5, 4.5))
    for gene in chromosome.get_genes():
        position = gene.get_band_position()
        height = gene.get_band_height()

        # Create a horizontal span
        plt.axhspan(position, position + height, alpha=0.5)
        plt.ylim(0, image_height)

    plt.gca().invert_yaxis()
    rounded_fitness = round(chromosome.get_fitness(), 4)
    rounded_metric = round(chromosome.get_metric(), 4)
    plt.title("Fitness: " + str(rounded_fitness))
    plt.suptitle(
        title
        + ": "
        + str(rounded_metric)
        + ";Parameters:"
        + str(chromosome.get_trainable_parameters())
    )
    plt.tight_layout()
    if results_path is not None:
        plt.gcf().savefig(os.path.join(results_path, f"{name}.png"))
    return plt.gcf()

log_tensorboard

log_tensorboard(
    best_chromosome,
    epoch,
    writer,
    tensorboard_log_dir,
    image_height,
    metric_name,
    results_path=None,
)
Source code in eso/utils/logger.py
def log_tensorboard(
    best_chromosome,
    epoch,
    writer,
    tensorboard_log_dir,
    image_height,
    metric_name,
    results_path=None,
):
    if tensorboard_log_dir is None:
        return

    if metric_name == "f1":
        suptitle_name = "F1-Score"
    else:
        suptitle_name = metric_name.capitalize()

    best_chromosome_fitness = best_chromosome.get_fitness()
    writer.add_scalar("Best Chromosome Fitness", best_chromosome_fitness, epoch)

    writer.add_scalar(
        "Best Chromosome Number of Bands", best_chromosome.num_genes, epoch
    )
    writer.add_scalar(
        f"Best Chromosome {suptitle_name}", best_chromosome.get_metric(), epoch
    )
    writer.add_scalar(
        "Best Chromosome Trainable Parameters",
        best_chromosome.get_trainable_parameters(),
        epoch,
    )
    # Create image
    figure = plot_chromosome(best_chromosome, image_height, suptitle_name, results_path)
    writer.add_figure("Best Chromosome", figure, epoch)
    plt.close()

setup_tensorboard

setup_tensorboard(tensorboard_log_dir, logger)
Source code in eso/utils/logger.py
def setup_tensorboard(tensorboard_log_dir, logger):
    if tensorboard_log_dir is not None:
        tensorboard_log_dir = os.path.join(
            tensorboard_log_dir, datetime.now().strftime("%Y%m%d-%H%M%S")
        )
        logger.debug(f"Logging training to {tensorboard_log_dir}")
        os.makedirs(tensorboard_log_dir, exist_ok=True)
        writer = SummaryWriter(tensorboard_log_dir)
        return writer
    else:
        return None

setup_logger

setup_logger(logger, log_path, log_level, name=None, add_stream_handler=True)
Source code in eso/utils/logger.py
def setup_logger(logger, log_path, log_level, name=None, add_stream_handler=True):
    if logger is not None:
        return logger
    else:
        import logging

        if name is None:
            name = __name__
        logger = logging.getLogger(name)
        logger.setLevel(log_level)


        for handler in logger.handlers[:]:
            logger.removeHandler(handler)
        if add_stream_handler:
            logger.addHandler(logging.StreamHandler())
        if log_path is not None:
            os.makedirs(log_path, exist_ok=True)
            logger.addHandler(
                logging.FileHandler(os.path.join(log_path, f"{name}.log"))
            )
    return logger

eso.utils.logger

Visualisation and logging. plot_chromosome renders the selected bands on top of a representative spectrogram, in the style of Figure 4 in the paper. setup_logger configures Python's standard logging to write to both a file and the console. setup_tensorboard and log_tensorboard push generation-level fitness scalars and the best chromosome's band layout to TensorBoard.

plot_chromosome

plot_chromosome(
    chromosome, image_height, title, results_path=None, name="current_best_chromosome"
)
Source code in eso/utils/logger.py
def plot_chromosome(
    chromosome, image_height, title, results_path=None, name="current_best_chromosome"
):
    plt.figure(figsize=(4.5, 4.5))
    for gene in chromosome.get_genes():
        position = gene.get_band_position()
        height = gene.get_band_height()

        # Create a horizontal span
        plt.axhspan(position, position + height, alpha=0.5)
        plt.ylim(0, image_height)

    plt.gca().invert_yaxis()
    rounded_fitness = round(chromosome.get_fitness(), 4)
    rounded_metric = round(chromosome.get_metric(), 4)
    plt.title("Fitness: " + str(rounded_fitness))
    plt.suptitle(
        title
        + ": "
        + str(rounded_metric)
        + ";Parameters:"
        + str(chromosome.get_trainable_parameters())
    )
    plt.tight_layout()
    if results_path is not None:
        plt.gcf().savefig(os.path.join(results_path, f"{name}.png"))
    return plt.gcf()

log_tensorboard

log_tensorboard(
    best_chromosome,
    epoch,
    writer,
    tensorboard_log_dir,
    image_height,
    metric_name,
    results_path=None,
)
Source code in eso/utils/logger.py
def log_tensorboard(
    best_chromosome,
    epoch,
    writer,
    tensorboard_log_dir,
    image_height,
    metric_name,
    results_path=None,
):
    if tensorboard_log_dir is None:
        return

    if metric_name == "f1":
        suptitle_name = "F1-Score"
    else:
        suptitle_name = metric_name.capitalize()

    best_chromosome_fitness = best_chromosome.get_fitness()
    writer.add_scalar("Best Chromosome Fitness", best_chromosome_fitness, epoch)

    writer.add_scalar(
        "Best Chromosome Number of Bands", best_chromosome.num_genes, epoch
    )
    writer.add_scalar(
        f"Best Chromosome {suptitle_name}", best_chromosome.get_metric(), epoch
    )
    writer.add_scalar(
        "Best Chromosome Trainable Parameters",
        best_chromosome.get_trainable_parameters(),
        epoch,
    )
    # Create image
    figure = plot_chromosome(best_chromosome, image_height, suptitle_name, results_path)
    writer.add_figure("Best Chromosome", figure, epoch)
    plt.close()

setup_tensorboard

setup_tensorboard(tensorboard_log_dir, logger)
Source code in eso/utils/logger.py
def setup_tensorboard(tensorboard_log_dir, logger):
    if tensorboard_log_dir is not None:
        tensorboard_log_dir = os.path.join(
            tensorboard_log_dir, datetime.now().strftime("%Y%m%d-%H%M%S")
        )
        logger.debug(f"Logging training to {tensorboard_log_dir}")
        os.makedirs(tensorboard_log_dir, exist_ok=True)
        writer = SummaryWriter(tensorboard_log_dir)
        return writer
    else:
        return None

setup_logger

setup_logger(logger, log_path, log_level, name=None, add_stream_handler=True)
Source code in eso/utils/logger.py
def setup_logger(logger, log_path, log_level, name=None, add_stream_handler=True):
    if logger is not None:
        return logger
    else:
        import logging

        if name is None:
            name = __name__
        logger = logging.getLogger(name)
        logger.setLevel(log_level)


        for handler in logger.handlers[:]:
            logger.removeHandler(handler)
        if add_stream_handler:
            logger.addHandler(logging.StreamHandler())
        if log_path is not None:
            os.makedirs(log_path, exist_ok=True)
            logger.addHandler(
                logging.FileHandler(os.path.join(log_path, f"{name}.log"))
            )
    return logger

eso.utils.unpickler

CPU_Unpickler is a pickle.Unpickler subclass that redirects GPU-tensor loads to CPU. Use it when loading a chromosome saved on a CUDA host onto a CPU-only machine for inspection or inference.

CPU_Unpickler

Bases: Unpickler

find_class

find_class(module, name)
Source code in eso/utils/unpickler.py
def find_class(self, module, name):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    if module == "torch.storage" and name == "_load_from_bytes":
        return lambda b: torch.load(io.BytesIO(b), map_location=device)
    else:
        return super().find_class(module, name)