xlingualNER API Tutorial

This tutorial will walk through the usage of the xlingualNER API.


The purpose of the xlingualNER system is to tag sentences of various languages with our fine-grained NER labels. The result this system, will be a file with the word on the left, and the tag on the right, space seperated. In the future, we hope this system will support more languages and also have increased accuracy.

The current endpoint for the xlingualNER is https://seaword.seasalt.ai/ner/.

See technical documentation and try out the API with the Swagger docs <https://seaword.seasalt.ai/ner/docs>.

Current models for xlingualNER can be found in the Azure fileshare at /mnt/models/nlp/xlingual-ner/. model_final is our final model, and the model_baseline is trained on all English and Chinese Ontonotes.


The xlingualNER pipeline consists of three parts: NERtagger, a datetime, misc outputs (ip, email address, url). The input is a sentence of any supported language, and an output example is listed further below. Our model uses NLTK’s word tokenizer by default, and uses Jieba for ZH.

The labels are as follows (copied from annotation instructions document):


Identifies the first names, last names, family names, unique nicknames of people and fictional characters. Generational markers (Jr., IV) are included in the extent, while personal and occupational titles such as Ms., Dr., President, Secretary are NOT included.

  • Example: Dr. [TITLE] Bob Smith, Sr. [PERSON] is Chinese-American [NATIONALITY]

  • Example: Mitt Romney[PERSON] is a politician


Adjectival forms of named religions (e.g. Christian, Muslim, Jewish), heritage, tribes, political groups(e.g. Democratic, Republicans)

  • Example: Ramadhan[EVENT] is a holy month for Muslim[NORP] communities

  • Example: I love South American[NORP] cuisines


Man-made infrastructures such as streets, bridges, highways, buildings, monuments, airports, parks.

  • Example: When I go to New York City[CITY], I shop on 5th Avenue[FAC].

  • Example: I will be flying from YVR Airport [FAC]

  • Example: I drove through the I-95 [FAC] to Maine[STATE_OR_PROVINCE]

  • Example: I went to Baxter State Park[FAC]


Describes government agencies, political agencies, educational institutions, sport teams, and musical groups. Names of hospitals, libraries, and museums are marked as ORG except when those are referred to in a locative way (expressing location).

  • Example: United States Congress[ORG] had a meeting

  • Example:The United Nations [ORG] meeting took place this weekend [DATE]

  • Example: the Supreme Court Justice[ORG]

  • Example: the White House[ORG]


Describes names of companies

  • Example: They recently opened the first[ORDINAL] Paul Bakery[ORG_COMP] in Canada[COUNTRY]


Includes job and position titles such as Professor, Doctor, as well as royal family titles

  • Example: The Queen[TITLE] of England[COUNTRY] loves ice cream

  • Example: I am a queen who loves ice cream → no named-entity


Religious beliefs and concepts like Judaism, Hinduism, Buddhism, Islam. However, it does not include religious groups.

  • Example: The spread of Islam[RELIGION] in Indonesia[COUNTRY] did not come directly from Arabic[NATIONALITY] country. Source

  • Example: Christianity[RELIGION] is a religion, while Christians[NORP] are religious groups


Named place such as mountain, river, region, continent, street name. IMPORTANT NOTE about LOC VS FAC labels: LOC are named-entities that indicate location but not a specific public man-made facility. LOC is more general and is used more often.

  • Example: Montenegro[COUNTRY] is located in Eastern Europe[LOC].

  • Example: East of France[LOC]

  • Example: South Boston[LOC]

  • Example: I will meet you at Chinatown[LOC]


City names, towns

  • Example: The first Starbucks[ORG_COMP] store was opened in Seattle[CITY].

  • Example: Company based in Bloomfield Township[CITY] , Oakland County[CITY] , Michigan[STATE_OR_PROVINCE].


Includes names of state or province

  • Example: New Hampshire[STATE_OR_PROVINCE] is the perfect outdoor playground


A zip code for Canadian or American zipcodes

  • Example: V1D6P3 or V1D 6P3 or 79645


Includes names of countries

  • Example: Bali[CITY] is located in Indonesia[COUNTRY]


Any members of state, country, city

  • Example: Many Canadians[NATIONALITY] want to become Parisians[NATIONALITY].


Anything that would be capitalized in English, even if it’s not in your language. (EG: Grammy Awards)

  • Example: They are using the latest Wi-Fi[MISC] technology.


Named products by companies

  • Example: I want to buy the latest Sony Camera[PRODUCT] by Sony[ORG_COMP]

  • Example: I got a new iPod[PRODUCT] for my birthday

  • Example: I do not want to buy cameras produced by Fuji[ORG_COMP]


Any popular/repeating events

  • Example: He attended The Presidential Election[EVENT]


Name of a play, movie, song, painting, or book

  • Example: He stole The Mona Lisa[WORK_OF_ART] last week from Louvre Museum[FAC].


Legal entity names

  • Example: They passed the American Constitution[LAW] in 1867[DATE]


Any language name

  • Example: I speak English[LANGUAGE], Bahasa Indonesia[LANGUAGE], and French[LANGUAGE]


Includes day, month, or year. Note that indirect references, such as “today”, “yesterday”, and “3 weeks from now” should also be tagged.

  • Example: He was born on January 3, 2002[DATE].

  • Example: On Monday[DATE], he was stuck in traffic for 2 hours[TIME]

  • Example: Last week[DATE], I worked for 8 hours[TIME]

  • Example: I love the 1940s[DATE] movies

  • Example: In the fall of 2008, I moved to Australia [COUNTRY]


Specific hour / minute / seconds.

  • Example: I have a meeting today[DATE] at 9:30am[TIME]

  • Negative example: On Wednesday[DATE], he was stuck in traffic for 2 hours[DURATION]


General time span

  • Example: I have a piano lesson every Wednesday[SET] from 2pm to 3pm[DURATION]


Specific repeating time span

  • Example: Every Tuesday[SET] he plays tennis


Anything with percentage

  • Example: He made around 75 percent[PERCENT] in profit.


Related to money and currencies

  • Example: He borrowed 17,000 British pounds[MONEY]

  • Another example: The Indonesian rupiah[MONEY] depreciated 50%[PERCENT] against the U.S. dollar[MONEY]


Number that doesn’t have a measurement

  • Example: Scrabble[PRODUCT] is a game played by 2[CARDINAL] , 3[CARDINAL] or 4[CARDINAL] people

  • Example: about half[ORDINAL] of the class received good marks

  • Example: four[ORDINAL] of my family members love skiing

  • Negative example: He bought one kg[QUANTITY] of oranges


Number and must have units of measurement (as of distance, weight)

  • Example: Sharks grow up to 20 feet [QUANTITY]

  • Example: Twenty[CARDINAL] 20 feet[QUANTITY] sharks

  • Example: I walked 4 miles [QUANTITY] to my home and carried one kg[QUANTITY]of oranges


Numbers that contain some sort of order

  • Example: He was first[ORDINAL] to arrive.

  • Another example: He was the millionth[ORDINAL] person to win.

  • Another example: Kamala Harris[PERSON] is the first[ORDINAL] African American[NATIONALITY] to hold the office of Attorney General[TITLE] in the state’s history.


Website url

  • Example: For more information, please visit our website at JD.com [URL]


Legal term

  • Example: He was charged with manslaughter[CRIMINAL_CHARGE] last week[TIME]

The last 3 tags were confusing so we removed them from our Label Studio annotations. They are no longer in the gold data, only the silver data that we trained on.





The following english sentence is submitted to the system:

    "text": "In the World War(1914–1918) the First Çanakkale was a battle area. Millions died in this war."

The system outputs three key elements which are a list of named entity tags at a token level, datetime grounding and miscellaneous information such as Email, URL and IP-address:

    "tags": [
            "In": "O"
            "the": "B-EVENT"
            "World": "I-EVENT"
            "War": "I-EVENT"
            "(": "O"
            "1914–1918": "B-DATE"
            ")": "O"
            "the": "B-LOC"
            "First": "I-LOC"
            "Çanakkale": "I-LOC"
            "was": "O"
            "a": "O"
            "battle": "O"
            "area": "O"
            ".": "O"
            "Millions": "B-CARDINAL"
            "died": "O"
            "in": "O"
            "this": "O"
            "war": "O"
            ".": "O"
    "datetime": [
            "start": 17,
            "end": 20,
            "resolution": {
                "values": [
                        "timex": "1914",
                        "type": "daterange",
                        "start": "1914-01-01",
                        "end": "1915-01-01"
            "text": "1914",
            "type_name": "datetimeV2.daterange"
            "start": 22,
            "end": 25,
            "resolution": {
                "values": [
                        "timex": "1918",
                        "type": "daterange",
                        "start": "1918-01-01",
                        "end": "1919-01-01"
            "text": "1918",
            "type_name": "datetimeV2.daterange"
    "misc": {
        "Email": [],
        "URL": [],
        "IP-address": []

The final tagging result from the system is the tags field. The datetime field shows the datetime grounding with start date and end date as well as the corresponding span of the input sentence. The misc field shows information about Email, URL and IP-address.

Language Support

The Cross-Lingual Named Entity Recognition API currently supports 9 languages. The input language following a convention of {lang_code}-{country_code} is specified via a query parameter when calling the demo endpoint.

While we intent to add more language support in the future, the following language codes are currently supported




en-XX, en-US, en-GB, en-AU, en-SG

Traditional Chinese


Simplifed Chinese
















API usage

POST /extract

To tag named-entities of a sentence, send a POST request to the /extract endpoint. Additionally, the language must be specified as a query parameter in the URL.

POST https://seaword.seasalt.ai/ner/{lang_code}/extract?access_token={api_key}

The endpoint tags a list of named entities calculates datetime grounding and extracts Email, URL and IP-address from the input sentences. The required request body for Named Entity Recognition requires only the following fields:

    "text": "string"

Once the Named Entity tagging has been performed on the full sentence, you will get the following result:

    "tags": "List",
    "datetime": "List",
    "misc": "Dict"

In this result the tags field represents the final tagged named entities, datetime is a list of datetime grounding that appears in the input sentence, misc is a dictionary contains Email, URL and IP-address extracted from the input sentence.