The core of the Sanskrit WordNet consists of basically four data types: lemmas, synsets, relations, and semfields. From one point of view, lemmas 'possess' synsets, in that words have different referential senses corresponding to discrete concepts. From another, synsets 'include' lemmas, in that a concept can be referred to by different words. This is similar to the onomasiological and semasiological distinction in structural linguistics. Relations are of two basic kinds -- semantic and lexical -- and represent linkages of various sorts (see below) between synsets or lemmas. The kinds of relations that can exist between two items depends on the part of speech of the 'source' item. Semfields gather together many different semantically related synsets under general conceptual domains, independent of their parts of speech. The WordNet API permits programmatic access to all four data types.
The API is accessed through URLs appended to the WordNet's base API address, https://sanskritwordnet.chs.harvard.edu/api
.
Typically, the API will return a list of results, which consist of nested dictionary-like mapping
objects.
A complete list of all lemmas presently included in the WordNet, ordered alphabetically and by part of
speech, is available through the index. /index
returns a (long) list of items with
morphological information and a unique resource identification number (URI) keyed to the Linking Latin
for
disambiguation. It can be filtered by part of speech ('n', 'v', 'a', 'r') or by morphological class
(e,g,,
'v1spia--1-' for only first conjugation active verbs). index/*/
, without any morphological
specification, is equivalent to /index
.
https://sanskritwordnet.chs.harvard.edu/api/index/ # complete index
https://sanskritwordnet.chs.harvard.edu/api/index/v/ # only verbs
https://sanskritwordnet.chs.harvard.edu/api/index/*/n-s---mn2-/ # only masculine nouns of the second declension
https://sanskritwordnet.chs.harvard.edu/api/index/n/n-p---nn2-/ # neuter _pluralia tantum_ of the second declension
Detailed information about individual lemmas is available by appending /lemmas
to the base
API
address and then providing filtering arguments that specify the relevant headword, and, optionally, part
of
speech, and morphological tag. If for some reason a morphological tag is provided without specifying the
part of speech *
must be indicated.
https://sanskritwordnet.chs.harvard.edu/api/lemmas/virtus/n/ # /virtus would also be acceptable
https://sanskritwordnet.chs.harvard.edu/api/lemmas/dico/v/ # returns two items
https://sanskritwordnet.chs.harvard.edu/api/lemmas/dico/v/v1spia--3-/ # disambiguates from the first conjugation verb
https://sanskritwordnet.chs.harvard.edu/api/lemmas/furor # returns __furor, -ari__ and __furor, -oris__
https://sanskritwordnet.chs.harvard.edu/api//lemmas/furor/*/n-s---mn3-/ # only the noun of this form
For complete disambiguation, it is also possible to access a specific lemma using its URI: /lemmas?uri=
.
To see the meanings (synsets) presently assigned to a word, /synsets
should be appended to
any
lemma query.
https://sanskritwordnet.chs.harvard.edu/api/lemmas/sicula/n/synsets
Similarly, a word's lexical relations can be obtained by appending /relations
, while its
semantic relations are obtainable via its synsets, using /synsets/relations
.
Detailed information about a particular sense (synset) in the WordNet is available using
/synsets
followed by the part of speech and relevant offset identification number. To
obtain
information about the lemmas belonging to a particular synset, append /lemmas
.
Alternatively,
the semantic relations pertaining to a synset are available at /relations
.
https://sanskritwordnet.chs.harvard.edu/api/synsets/n/03316977/ # 'a protective structure or device (usually metal)
https://sanskritwordnet.chs.harvard.edu/api/synsets/v/01207150/lemmas
https://sanskritwordnet.chs.harvard.edu/api/synsets/a/01918843/relations
Semfields represent very large conceptual domains encompassing many synsets. Presently the Sanskrit WordNet takes advantage of the Dewey Decimal Classification System as a topic index, in order to provide an appropriate degree of conceptual granularity and hierarchy. To access a semfield record in the WordNet, you will need its DDCS code. E.g., '630' is 'Agriculture' in the hundreds division and 'Agriculture & Related Technologies' in the tens division.
https://sanskritwordnet.chs.harvard.edu/api/semfields/630
This listing describes the hierarchical (superordinate and subordinate) relations of the semfield in
question. /synsets
instead indicates the
specific synsets within these domains, and /lemmas
resolves each of these synsets to a list
of lemmas.
The Sanskrit WordNet provides a lemmatization service at /lemmatize
, using the morphological
information in the database.
https://sanskritwordnet.chs.harvard.edu/lemmatize/reginarum
Results will consist of a list of possible lemmas for this form, along with relevant morphological
analyses. A part-of-speech filter can be applied by appending /n
, /v
,
/a
, /r
or /p
Additionally, the API offers a translation service to translate some words from English, French, Italian, Spanish and even Hebrew into Sanskrit. The source language must be given as an ISO 639 code, and a part of speech can be optionally provided.
https://sanskritwordnet.chs.harvard.edu/translate/en/war # English
https://sanskritwordnet.chs.harvard.edu/translate/es/guerra # Spanish
https://sanskritwordnet.chs.harvard.edu/sentiment
.
The payload should consist of a JSON object
containing at least a value for 'text'. Optionally, 'weighting' can be used to specify a weighting method,
with possible values of 'average', 'harmonic' or 'geometric'. A further option is to include a list
of lists of the form ["lemma", "morpho", "uri"]
designating lemmas to exclude from the
analysis.
{ "text": "cor meum, spes mea, mel meum, suavitudo, cibus, gaudium" } # Plaut. Bacch. 18
{ "text": "antiqua comoedia grandis et elegans et venusta", "excluded": [["antiquo', "v1spia--1-", ""], ["grandio", "v1spia--4-", ""]]} # Quint. IO. 10.1.65
{ "text": "bella es, novimus, et puella, verum est", 'excluded": [["bellum", "n-s---nn2-", ""], ["bello", "v1spia--1-", ""], ["ver", "n-s---mn3-", ""]]} # Mart. 1.64.1
{ "text": "hic manebimus optime" } # Liv. AUC. 5.55
{ "text": "tu mihi sola places", "excluded": [["placo", "v1spia--1-", ""]]} # Ov. Ars 1.42
{ "text": "odiosus mihi es" } # Plaut. Ps. 30
{ "text": "pedicabo ego vos et irrumabo" } # Cat. Carm. 16.1
Finally, the API provides a mechanism for searching for partial lemmas and for synsets or semfields by their English glosses.
https://sanskritwordnet.chs.harvard.edu/api/lemmas?search=bula # All words containing the string 'bula'
https://sanskritwordnet.chs.harvard.edu/api/synsets?search=mythology # Any synset with the string 'mythology' in its
gloss
https://sanskritwordnet.chs.harvard.edu/api/semfields?search=military # Any semfield with 'military' in its label