Topic Extraction API Tutorial

This tutorial will walk through how to use the Topic Extraction API for meeting analytics.

The endpoint for the Topic Extraction API is https://seaword.seasalt.ai/topics.

Multilingual Support

The Topic Extraction API currently supports the following languages: English, Traditional Chinese, Simplified Chinese, Indonesian, Javanese, Malay, Tagalog, and Vietnamese. By default, the Topic Extraction API will extract topics in English.

All API calls for each language include a language code in the URL.

The language codes for supported languages are the following:

Language

Code

English

en-XX

Traditional Chinese

zh-TW

Simplified Chinese

zh-CN

Indonesian

id-XX

Javanese

jv-XX

Malay

ms-XX

Tagalog

tl-XX

Vietnamese

vi-XX

Callback URLs

The Topic Extraction API uses callback URLs in some of its endpoints. These endpoints can take longer to execute, and the callback URL allows the process to run in the background and return the answer to you when it is finished.

For testing these endpoints, two resources that are available to provide testing URLs as your callback URL are webhook.site and ngrok.

In this tutorial, we will use a webhook.site URL in the examples.

API Usage

The Topic Extraction API supports topic extraction with and without a callback link. The method with a callback link is intended for very long input that will take longer to complete. The method without is intended for short input where timeout is not a concern.

POST /extract_no_callback

The no-callback process is called with the following POST request:

POST https://seaword.seasalt.ai/topics/{lang_code}/extract_no_callback?access_token={api_key}

The required request body for no-callback topic extraction is in the following format:

{
    "data": [
        {
        "transcription": "",
        "summarization": "The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)."
        }
    ]
}

You can populate either the "summarization" field or the "transcription" field. If the "transcription" field is populated, the topic extraction will use that first.

The reason for using summarization is to ensure well-formedness. Speech-to-text may result in noisy input, so we strive to provide a sanitized input to the Topic Extraction API as much as possible.

To summarize transcriptions, the Summarization API may be used:

Summarization Tutorial —>

A successful response will have the following output:

{
    "topics": [
        {
            "tag_name": "coronavirus pandemic"
        },
        {
            "tag_name": "coronavirus disease 2019"
        },
        {
            "tag_name": "ongoing global pandemic"
        },
        {
            "tag_name": "coronavirus disease"
        }
    ]
}

POST /extract

The callback process is called with the following POST request:

POST https://seaword.seasalt.ai/topics/{lang_code}/extract?access_token={api_key}

The required request body for topic extraction with callback is in the following format:

{
    "data": [
        {
        "transcription": "",
        "summarization": "The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)."
        }
    ],
    "url": "https://webhook.site/3c95b604-4243-4d15-b5ce-a79e18dec3e7"
}

The url field is where callback URL is specified. This URL is used when the API is ready to send its output to the person who submitted a topic extraction request.

A successful response will return the following message:

{'message': 'Topic extraction process started.'}

The output of topic extraction is sent to the callback URL where you will be able to use it.

When using the extract endpoint, we do not have to wait for topic extraction results when we send a request. We let the API process the input without having to worry about the request timing out. Once all the data is processed, the API sends its own request to the callback URL that it was given in the original request.

POST /extract_live

In cases where we want to get live results and process a transcript piece by piece, we can use the /extract_live endpoint. Instead of an entire meeting transcript, this endpoint accepts a single transcription segment in the following format:

{
    "meeting_id": Text,
    "sequence": int,
    "account_id": Text,
    "transcription": Text,
    "callback_url": HttpUrl
}

The "meeting_id" uniquely identifies the meeting, while the "sequence" identifies the segment. The "account_id" uniquely identifies the SeaMeet account.

The topic extraction process will be performed on the segment and the resulting topics for each segment will be cached. If the topic extraction system predicts a unique topic for the transcript (ie. it predicts a topic for the first time on a specific meeting transcript), the API will send a callback containing the new topics in the following format:

{
    "meeting_id": Text,
    "sequence": int,
    "account_id": Text,
    "topics": List[Keywords]
}