Sanskrit WordNet API

Overview

The core of the Sanskrit WordNet consists of basically four data types: lemmas, synsets, relations, and semfields. From one point of view, lemmas 'possess' synsets, in that words have different referential senses corresponding to discrete concepts. From another, synsets 'include' lemmas, in that a concept can be referred to by different words. This is similar to the onomasiological and semasiological distinction in structural linguistics. Relations are of two basic kinds -- semantic and lexical -- and represent linkages of various sorts (see below) between synsets or lemmas. The kinds of relations that can exist between two items depends on the part of speech of the 'source' item. Semfields gather together many different semantically related synsets under general conceptual domains, independent of their parts of speech. The WordNet API permits programmatic access to all four data types.

The API is accessed through URLs appended to the WordNet's base API address, https://sanskritwordnet.chs.harvard.edu/api. Typically, the API will return a list of results, which consist of nested dictionary-like mapping objects.

Usage

Index

A complete list of all lemmas presently included in the WordNet, ordered alphabetically and by part of speech, is available through the index. /index returns a (long) list of items with morphological information and a unique resource identification number (URI) keyed to the Linking Latin for disambiguation. It can be filtered by part of speech ('n', 'v', 'a', 'r') or by morphological class (e,g,, 'v1spia--1-' for only first conjugation active verbs). index/*/, without any morphological specification, is equivalent to /index.

            https://sanskritwordnet.chs.harvard.edu/api/index/ # complete index
https://sanskritwordnet.chs.harvard.edu/api/index/v/ # only verbs
https://sanskritwordnet.chs.harvard.edu/api/index/*/n-s---mn2-/ # only masculine nouns of the second declension
https://sanskritwordnet.chs.harvard.edu/api/index/n/n-p---nn2-/ # neuter _pluralia tantum_ of the second declension

Lemmas

Detailed information about individual lemmas is available by appending /lemmas to the base API address and then providing filtering arguments that specify the relevant headword, and, optionally, part of speech, and morphological tag. If for some reason a morphological tag is provided without specifying the part of speech * must be indicated.

                https://sanskritwordnet.chs.harvard.edu/api/lemmas/virtus/n/ # /virtus would also be acceptable
https://sanskritwordnet.chs.harvard.edu/api/lemmas/dico/v/ # returns two items
https://sanskritwordnet.chs.harvard.edu/api/lemmas/dico/v/v1spia--3-/ # disambiguates from the first conjugation verb
https://sanskritwordnet.chs.harvard.edu/api/lemmas/furor # returns __furor, -ari__ and __furor, -oris__
https://sanskritwordnet.chs.harvard.edu/api//lemmas/furor/*/n-s---mn3-/ # only the noun of this form

For complete disambiguation, it is also possible to access a specific lemma using its URI: /lemmas?uri=.

To see the meanings (synsets) presently assigned to a word, /synsets should be appended to any lemma query.

https://sanskritwordnet.chs.harvard.edu/api/lemmas/sicula/n/synsets

Similarly, a word's lexical relations can be obtained by appending /relations, while its semantic relations are obtainable via its synsets, using /synsets/relations.

Synsets

Detailed information about a particular sense (synset) in the WordNet is available using /synsets followed by the part of speech and relevant offset identification number. To obtain information about the lemmas belonging to a particular synset, append /lemmas. Alternatively, the semantic relations pertaining to a synset are available at /relations.

            https://sanskritwordnet.chs.harvard.edu/api/synsets/n/03316977/ # 'a protective structure or device (usually metal)
https://sanskritwordnet.chs.harvard.edu/api/synsets/v/01207150/lemmas
https://sanskritwordnet.chs.harvard.edu/api/synsets/a/01918843/relations

Semfields

Semfields represent very large conceptual domains encompassing many synsets. Presently the Sanskrit WordNet takes advantage of the Dewey Decimal Classification System as a topic index, in order to provide an appropriate degree of conceptual granularity and hierarchy. To access a semfield record in the WordNet, you will need its DDCS code. E.g., '630' is 'Agriculture' in the hundreds division and 'Agriculture & Related Technologies' in the tens division.

https://sanskritwordnet.chs.harvard.edu/api/semfields/630

This listing describes the hierarchical (superordinate and subordinate) relations of the semfield in question. /synsets instead indicates the specific synsets within these domains, and /lemmas resolves each of these synsets to a list of lemmas.

Lemmatization

The Sanskrit WordNet provides a lemmatization service at /lemmatize, using the morphological information in the database.

https://sanskritwordnet.chs.harvard.edu/lemmatize/reginarum

Results will consist of a list of possible lemmas for this form, along with relevant morphological analyses. A part-of-speech filter can be applied by appending /n, /v, /a, /r or /p

Translation

Additionally, the API offers a translation service to translate some words from English, French, Italian, Spanish and even Hebrew into Sanskrit. The source language must be given as an ISO 639 code, and a part of speech can be optionally provided.

            https://sanskritwordnet.chs.harvard.edu/translate/en/war # English
                https://sanskritwordnet.chs.harvard.edu/translate/es/guerra # Spanish

Sentiment Analysis

The API can perform sentiment analysis of individual strings via HTTP POST requests to https://sanskritwordnet.chs.harvard.edu/sentiment. The payload should consist of a JSON object containing at least a value for 'text'. Optionally, 'weighting' can be used to specify a weighting method, with possible values of 'average', 'harmonic' or 'geometric'. A further option is to include a list of lists of the form ["lemma", "morpho", "uri"] designating lemmas to exclude from the analysis.

            
                { "text": "cor meum, spes mea, mel meum, suavitudo, cibus, gaudium" }  # Plaut. Bacch. 18
                { "text": "antiqua comoedia grandis et elegans et venusta", "excluded": [["antiquo', "v1spia--1-", ""], ["grandio", "v1spia--4-", ""]]}  # Quint. IO. 10.1.65
                { "text": "bella es, novimus, et puella, verum est", 'excluded": [["bellum", "n-s---nn2-", ""], ["bello", "v1spia--1-", ""], ["ver", "n-s---mn3-", ""]]}  # Mart. 1.64.1
                { "text": "hic manebimus optime" }  # Liv. AUC. 5.55
                { "text": "tu mihi sola places", "excluded": [["placo", "v1spia--1-", ""]]}  # Ov. Ars 1.42
                { "text": "odiosus mihi es" }  # Plaut. Ps. 30
                { "text": "pedicabo ego vos et irrumabo" }  # Cat. Carm. 16.1

Searching

Finally, the API provides a mechanism for searching for partial lemmas and for synsets or semfields by their English glosses.

                https://sanskritwordnet.chs.harvard.edu/api/lemmas?search=bula # All words containing the string 'bula'
                https://sanskritwordnet.chs.harvard.edu/api/synsets?search=mythology # Any synset with the string 'mythology' in its
            gloss
                https://sanskritwordnet.chs.harvard.edu/api/semfields?search=military # Any semfield with 'military' in its label