Pyannote

Speaker Diarization API

Pyannote is a state-of-the-art speaker diarization model that identifies who spoke when in an audio file. It can automatically detect the number of speakers or work with a specified count.

Endpoint

POST https://api-gpuse.maatrics.com/v1/pyannote/diarize

Parameters

NameTypeRequiredDescription
urlstringYesURL of the audio file to process
webhookstringNoURL to receive completion notification
num_speakersintegerNoExact number of speakers (if known)
min_speakersintegerNoMinimum number of speakers
max_speakersintegerNoMaximum number of speakers

Request Example

bash
curl -X POST "https://api-gpuse.maatrics.com/v1/pyannote/diarize" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/meeting.wav",
    "webhook": "https://your-server.com/webhook",
    "num_speakers": 2,
    "min_speakers": 1,
    "max_speakers": 5
  }'

Response

Initial response when job is created:

json
{
  "job_id": "5d2aee8b-c35b-4fdc-af7d-3309b19b7420",
  "status": "processing",
  "created_at": "2024-01-15T10:30:00Z",
  "estimated_duration": "2 minutes"
}

Completed Result

Response when job is completed (via webhook or polling):

json
{
  "job_id": "5d2aee8b-c35b-4fdc-af7d-3309b19b7420",
  "status": "completed",
  "result": {
    "segments": [
      {
        "speaker": "SPEAKER_00",
        "start": 0.0,
        "end": 2.5,
        "text": null
      },
      {
        "speaker": "SPEAKER_01",
        "start": 2.7,
        "end": 5.2,
        "text": null
      },
      {
        "speaker": "SPEAKER_00",
        "start": 5.5,
        "end": 8.1,
        "text": null
      }
    ],
    "num_speakers": 2,
    "duration_seconds": 120.5
  },
  "cost": 0.012,
  "processing_time": 45.2
}

Pricing

$0.006per minute of audio

Billed per second. Minimum charge: 1 second.