Interface NCModelView

    • Method Detail

      • getId

        String getId()
        Gets unique, immutable ID of this model.

        Note that model IDs are immutable while name and version can be changed freely. Changing model ID is equal to creating a completely new model. Model IDs (unlike name and version) are not exposed to the end user and only serve a technical purpose. ID's max length is 32 characters.

        JSON
        If using JSON/YAML model presentation this is set by id property:

         {
              "id": "my.model.id"
         }
         
        Returns:
        Unique, immutable ID of this model.
      • getName

        String getName()
        Gets descriptive name of this model. Name's max length is 64 characters.

        JSON
        If using JSON/YAML model presentation this is set by name property:

         {
              "name": "My Model"
         }
         
        Returns:
        Descriptive name for this model.
      • getVersion

        String getVersion()
        Gets the version of this model using semantic versioning. Version's max length is 16 characters.

        JSON
        If using JSON/YAML model presentation this is set by version property:

         {
              "version": "1.0.0"
         }
         
        Returns:
        A version compatible with (www.semver.org) specification.
      • getDescription

        default String getDescription()
        Gets optional short model description. This can be displayed by the management tools.

        JSON
        If using JSON/YAML model presentation this is set by description property:

         {
              "description": "Model description..."
         }
         
        Returns:
        Optional short model description.
      • getOrigin

        default String getOrigin()
        Gets the origin of this model like name of the class, file path or URL.
        Returns:
        Origin of this model like name of the class, file path or URL.
      • getMaxUnknownWords

        default int getMaxUnknownWords()
        Gets maximum number of unknown words until automatic rejection. An unknown word is a word that is not part of Princeton WordNet database. If you expect a very formalized and well defined input without uncommon slang and abbreviations you can set this to a small number like one or two. However, in most cases we recommend to leave it as default or set it to a larger number like five or more.

        Default
        If not provided by the model the default value DFLT_MAX_UNKNOWN_WORDS will be used.

        JSON
        If using JSON/YAML model presentation this is set by maxUnknownWords property:

         {
              "maxUnknownWords": 2
         }
         
        Returns:
        Maximum number of unknown words until automatic rejection.
      • getMaxFreeWords

        default int getMaxFreeWords()
        Gets maximum number of free words until automatic rejection. A free word is a known word that is not part of any recognized token. In other words, a word that is present in the user input but won't be used to understand its meaning. Setting it to a non-zero risks the misunderstanding of the user input, while setting it to zero often makes understanding logic too rigid. In most cases we recommend setting to between one and three. If you expect the user input to contain many noisy idioms, slang or colloquials - you can set it to a larger number.

        Default
        If not provided by the model the default value DFLT_MAX_FREE_WORDS will be used.

        JSON
        If using JSON/YAML model presentation this is set by maxFreeWords property:

         {
              "maxFreeWords": 2
         }
         
        Returns:
        Maximum number of free words until automatic rejection.
      • getMaxSuspiciousWords

        default int getMaxSuspiciousWords()
        Gets maximum number of suspicious words until automatic rejection. A suspicious word is a word that is defined by the model that should not appear in a valid user input under no circumstances. A typical example of suspicious words would be words "sex" or "porn" when processing queries about children books. In most cases this should be set to zero (default) to automatically reject any such suspicious words in the user input.

        Default
        If not provided by the model the default value DFLT_MAX_SUSPICIOUS_WORDS will be used.

        JSON
        If using JSON/YAML model presentation this is set by maxSuspiciousWords property:

         {
              "maxSuspiciousWords": 2
         }
         
        Returns:
        Maximum number of suspicious words until automatic rejection.
      • getMinWords

        default int getMinWords()
        Gets minimum word count (including stopwords) below which user input will be automatically rejected as too short. In almost all cases this value should be greater than or equal to one.

        Default
        If not provided by the model the default value DFLT_MIN_WORDS will be used.

        JSON
        If using JSON/YAML model presentation this is set by minWords property:

         {
              "minWords": 2
         }
         
        Returns:
        Minimum word count (including stopwords) below which user input will be automatically rejected as too short.
      • getMaxWords

        default int getMaxWords()
        Gets maximum word count (including stopwords) above which user input will be automatically rejected as too long. In almost all cases this value should be greater than or equal to one.

        Default
        If not provided by the model the default value DFLT_MAX_WORDS will be used.

        JSON
        If using JSON/YAML model presentation this is set by maxWords property:

         {
              "maxWords": 50
         }
         
        Returns:
        Maximum word count (including stopwords) above which user input will be automatically rejected as too long.
      • getMinTokens

        default int getMinTokens()
        Gets minimum number of all tokens (system and user defined) below which user input will be automatically rejected as too short. In almost all cases this value should be greater than or equal to one.

        Default
        If not provided by the model the default value DFLT_MIN_TOKENS will be used.

        JSON
        If using JSON/YAML model presentation this is set by minTokens property:

         {
              "minTokens": 1
         }
         
        Returns:
        Minimum number of all tokens.
      • getMaxTokens

        default int getMaxTokens()
        Gets maximum number of all tokens (system and user defined) above which user input will be automatically rejected as too long. Note that sentences with large number of token can result in significant processing delay and substantial memory consumption.

        Default
        If not provided by the model the default value DFLT_MAX_TOKENS will be used.

        JSON
        If using JSON/YAML model presentation this is set by maxTokens property:

         {
              "maxTokens": 100
         }
         
        Returns:
        Maximum number of all tokens.
      • getMinNonStopwords

        default int getMinNonStopwords()
        Gets minimum word count (excluding stopwords) below which user input will be automatically rejected as ambiguous sentence.

        Default
        If not provided by the model the default value DFLT_MIN_NON_STOPWORDS will be used.

        JSON
        If using JSON/YAML model presentation this is set by minNonStopwords property:

         {
              "minNonStopwords": 2
         }
         
        Returns:
        Minimum word count (excluding stopwords) below which user input will be automatically rejected as too short.
      • isNonEnglishAllowed

        default boolean isNonEnglishAllowed()
        Whether or not to allow non-English language in user input. Currently, only English language is supported. However, model can choose whether or not to automatically reject user input that is detected to be a non-English. Note that current algorithm only works reliably on longer user input (10+ words). On short sentences it will often produce an incorrect result.

        Default
        If not provided by the model the default value DFLT_IS_NON_ENGLISH_ALLOWED will be used.

        JSON
        If using JSON/YAML model presentation this is set by nonEnglishAllowed property:

         {
              "nonEnglishAllowed": false
         }
         
        Returns:
        Whether or not to allow non-English language in user input.
      • isNotLatinCharsetAllowed

        default boolean isNotLatinCharsetAllowed()
        Whether or not to allow non-Latin charset in user input. Currently, only Latin charset is supported. However, model can choose whether or not to automatically reject user input with characters outside of Latin charset. If false such user input will be automatically rejected.

        Default
        If not provided by the model the default value DFLT_IS_NOT_LATIN_CHARSET_ALLOWED will be used.

        JSON
        If using JSON/YAML model presentation this is set by nonLatinCharsetAllowed property:

         {
              "nonLatinCharsetAllowed": false
         }
         
        Returns:
        Whether or not to allow non-Latin charset in user input.
      • isSwearWordsAllowed

        default boolean isSwearWordsAllowed()
        Whether or not to allow known English swear words in user input. If false - user input with detected known English swear words will be automatically rejected.

        Default
        If not provided by the model the default value DFLT_IS_SWEAR_WORDS_ALLOWED will be used.

        JSON
        If using JSON/YAML model presentation this is set by swearWordsAllowed property:

         {
              "swearWordsAllowed": false
         }
         
        Returns:
        Whether or not to allow known swear words in user input.
      • isNoNounsAllowed

        default boolean isNoNounsAllowed()
        Whether or not to allow user input without a single noun. If false such user input will be automatically rejected. Typically for strict command or query-oriented models this should be set to false as any command or query should have at least one noun subject. However, for conversational models this can be set to false to allow for a smalltalk and one-liners.

        Default
        If not provided by the model the default value DFLT_IS_NO_NOUNS_ALLOWED will be used.

        JSON
        If using JSON/YAML model presentation this is set by noNounsAllowed property:

         {
              "noNounsAllowed": false
         }
         
        Returns:
        Whether or not to allow user input without a single noun.
      • isPermutateSynonyms

        default boolean isPermutateSynonyms()
        Whether or not to permutate multi-word synonyms. Automatic multi-word synonyms permutations greatly increase the total number of synonyms in the system and allows for better multi-word synonym detection. For example, if permutation is allowed the synonym "a b c" will be automatically converted into a sequence of synonyms of "a b c", "b a c", "a c b". This property is closely related to isSparse() which are typically changed together. Note that individual model elements can override this property using NCElement.isPermutateSynonyms() method.

        Default
        If not provided by the model the default value DFLT_IS_PERMUTATE_SYNONYMS will be used.

        JSON
        If using JSON/YAML model presentation this is set by permutateSynonyms property:

         {
              "permutateSynonyms": true
         }
         
        Returns:
        Whether or not to permutate multi-word synonyms.
        See Also:
        NCElement.isPermutateSynonyms(), NCElement.isSparse(), isSparse()
      • isDupSynonymsAllowed

        default boolean isDupSynonymsAllowed()
        Whether or not duplicate synonyms are allowed. If true - the model will pick the random model element when multiple elements found due to duplicate synonyms. If false - model will print error message and will not deploy.

        Default
        If not provided by the model the default value DFLT_IS_DUP_SYNONYMS_ALLOWED will be used.

        JSON
        If using JSON/YAML model presentation this is set by dupSynonymsAllowed property:

         {
              "dupSynonymsAllowed": true
         }
         
        Returns:
        Whether or not to allow duplicate synonyms.
      • getMaxTotalSynonyms

        default int getMaxTotalSynonyms()
        Total number of synonyms allowed per model. Model won't deploy if total number of synonyms exceeds this number.

        Default
        If not provided by the model the default value DFLT_MAX_TOTAL_SYNONYMS will be used.

        JSON
        If using JSON/YAML model presentation this is set by maxTotalSynonyms property:

         {
              "maxTotalSynonyms": true
         }
         
        Returns:
        Total number of synonyms allowed per model.
        See Also:
        getMaxElementSynonyms()
      • isNoUserTokensAllowed

        default boolean isNoUserTokensAllowed()
        Whether or not to allow the user input with no user token detected. If false such user input will be automatically rejected. Note that this property only applies to user-defined token (i.e. model element). Even if there are no user defined tokens, the user input may still contain system token like nlpcraft:city or nlpcraft:date. In many cases models should be build to allow user input without user tokens. However, set it to false if presence of at least one user token is mandatory.

        Default
        If not provided by the model the default value DFLT_IS_NO_USER_TOKENS_ALLOWED will be used.

        JSON
        If using JSON/YAML model presentation this is set by noUserTokensAllowed property:

         {
              "noUserTokensAllowed": false
         }
         
        Returns:
        Whether or not to allow the user input with no user token detected.
      • isSparse

        default boolean isSparse()
        Whether or not this model elements allows non-stop words gaps in their multi-word synonyms. This property is closely related to isPermutateSynonyms() which are typically changed together. Note that individual model elements can override this property using NCElement.isSparse() method.

        Default
        If not provided by the model the default value DFLT_IS_SPARSE will be used.

        JSON
        If using JSON/YAML model presentation this is set by sparse:

         {
              "sparse": true
         }
         
        Returns:
        Optional multi-word synonym sparsity model property.
        See Also:
        NCElement.isSparse(), NCElement.isPermutateSynonyms(), isPermutateSynonyms()
      • getMetadata

        default Map<String,​Object> getMetadata()
        Gets optional user defined model metadata that can be set by the developer and accessed later. By default, it returns an empty map. Note that this metadata is mutable and can be changed at runtime by the model's code.

        JSON
        If using JSON/YAML model presentation this is set by metadata property:

         {
              "metadata": {
                  "str": "val1",
                  "num": 100,
                  "bool": false
              }
         }
         
        Specified by:
        getMetadata in interface NCMetadata
        Returns:
        Optional user defined model metadata. By default, returns an empty map. Never returns null.
        See Also:
        NCMetadata.meta(String), NCMetadata.metaOpt(String), NCMetadata.meta(String, Object)
      • getAdditionalStopWords

        default Set<String> getAdditionalStopWords()
        Gets an optional list of stopwords to add to the built-in ones.

        Stopword is an individual word (i.e. sequence of characters excluding whitespaces) that contribute no semantic meaning to the sentence. For example, 'the', 'wow', or 'hm' provide no semantic meaning to the sentence and can be safely excluded from semantic analysis.

        NLPCraft comes with a carefully selected list of English stopwords which should be sufficient for a majority of use cases. However, you can add additional stopwords to this list. The typical use for user-defined stopwords are jargon parasite words that are specific to the model's domain.

        JSON
        If using JSON/YAML model presentation this is set by additionalStopwords property:

         {
              "additionalStopwords": [
                  "stopword1",
                  "stopword2"
              ]
         }
         
        Returns:
        Potentially empty list of additional stopwords.
      • getExcludedStopWords

        default Set<String> getExcludedStopWords()
        Gets an optional list of stopwords to exclude from the built-in list of stopwords.

        Just like you can add additional stopwords via getAdditionalStopWords() you can exclude certain words from the list of stopwords. This can be useful in rare cases when default built-in stopword has specific meaning of your model. In order to process them you need to exclude them from the list of stopwords.

        JSON
        If using JSON/YAML model presentation this is set by excludedStopwords property:

         {
              "excludedStopwords": [
                  "excludedStopword1",
                  "excludedStopword2"
              ]
         }
         
        Returns:
        Potentially empty list of excluded stopwords.
      • getSuspiciousWords

        default Set<String> getSuspiciousWords()
        Gets an optional list of suspicious words. A suspicious word is a word that generally should not appear in user sentence when used with this model. For example, if a particular model is for children oriented book search, the words "sex" and "porn" should probably NOT appear in the user input and can be automatically rejected when added here and model's metadata MAX_SUSPICIOUS_WORDS property set to zero.

        Note that by setting model's metadata MAX_SUSPICIOUS_WORDS property to non-zero value you can adjust the sensitivity of suspicious words auto-rejection logic.

        JSON
        If using JSON/YAML model presentation this is set by suspiciousWords property:

         {
              "suspiciousWords": [
                  "sex",
                  "porn"
              ]
         }
         
        Returns:
        Potentially empty list of suspicious words in their lemma form.
      • getMacros

        default Map<String,​String> getMacros()
        Gets an optional map of macros to be used in this model. Macros and option groups are instrumental in defining model's elements. See NCElement for documentation on macros.

        JSON
        If using JSON/YAML model presentation this is set by macros property:

         {
              "macros": [
                  {
                      "name": "<OF>",
                      "macro": "{of|for|per}"
                  },
                  {
                      "name": "<CUR>",
                      "macro": "{current|present|moment|now}"
                  }
              ]
         }
         
        Returns:
        Potentially empty map of macros.
      • getParsers

        default List<NCCustomParser> getParsers()
        Gets optional user-defined model element parsers for custom NER implementations. Note that order of the parsers is important as they will be invoked in the same order they are returned.

        By default, the data model detects its elements by their synonyms, regexp or IDL expressions. However, in some cases these methods are not expressive enough. In such cases, a user-defined parser can be defined for the model that would allow the user to define its own NER logic to detect the model elements in the user input programmatically. Note that a single parser can detect any number of model elements.

        JSON
        If using JSON/YAML model presentation this is set by parser property which is an array with every element being a fully qualified class name implementing NCCustomParser interface:

         {
              "parsers": [
                  "my.package.Parser1",
                  "my.package.Parser2"
              ]
         }
         
        Returns:
        Custom user parsers for model elements or empty list if not used (default). Never returns null.
      • getElements

        default Set<NCElement> getElements()
        Gets a set of model elements or named entities. Model can have zero or more user defined elements.

        An element is the main building block of the model. Data model element defines a named entity that will be automatically recognized in the user input. See also getParsers() method on how to provide programmatic named entity recognizer (NER) implementations.

        Note that unless model elements are loaded dynamically it is highly recommended to declare model elements in the external JSON/YAML model configuration (under elements property):

         {
              "elements": [
                 {
                     "id": "wt:hist",
                     "synonyms": [
                         "{<WEATHER>|_} <HISTORY>",
                         "<HISTORY> {<OF>|_} <WEATHER>"
                     ],
                     "description": "Past weather conditions."
                 }
              ]
         }
         
        Returns:
        Set of model elements, potentially empty.
        See Also:
        getParsers()
      • getEnabledBuiltInTokens

        default Set<String> getEnabledBuiltInTokens()
        Gets a set of IDs for built-in named entities (tokens) that should be enabled and detected for this model. Unless model requests (i.e. enables) the built-in tokens in this method the NLP subsystem will not attempt to detect them. Explicit enablement of the token significantly improves the overall performance by avoiding unnecessary token detection. Note that you don't have to specify your own user elements here as they are always enabled.

        Default
        The following built-in tokens are enabled by default implementation of this method:

        • nlpcraft:date
        • nlpcraft:continent
        • nlpcraft:subcontinent
        • nlpcraft:country
        • nlpcraft:metro
        • nlpcraft:region
        • nlpcraft:city
        • nlpcraft:num
        • nlpcraft:coordinate
        • nlpcraft:relation
        • nlpcraft:sort
        • nlpcraft:limit
        Note that this method can return an empty list if the data model doesn't need any built-in tokens for its logic. See NCToken for the list of all supported built-in tokens.

        JSON
        If using JSON/YAML model presentation this is set by enabledBuiltInTokens property:

         {
              "enabledBuiltInTokens": [
                  "google:person",
                  "google:location",
                  "stanford:money"
              ]
         }
         
        Returns:
        Set of built-in tokens, potentially empty but never null, that should be enabled and detected for this model.
      • getAbstractTokens

        default Set<String> getAbstractTokens()
        Gets s set of named entities (token) IDs that will be considered as abstract tokens. An abstract token is only detected when it is either a constituent part of some other non-abstract token or referenced by built-in tokens. In other words, an abstract token will not be detected in a standalone unreferenced position. By default (unless returned by this method), all named entities considered to be non-abstract.

        Declaring tokens as abstract is important to minimize number of parsing variants automatically generated as permutation of all possible parsing compositions. For example, if it is known that a particular named entity will only be used as a constituent part of some other token - declaring such named entity as abstract can significantly reduce the number of parsing variants leading to a better performance, and often simpler corresponding intent definition and callback logic.

        Returns:
        Set of abstract token IDs. Can be empty but never null.
      • getMaxElementSynonyms

        default int getMaxElementSynonyms()
        Gets maximum number of unique synonyms per model element after which either warning or error will be triggered. Note that there is no technical limit on how many synonyms a model element can have apart from memory consumption and performance considerations. However, in cases where synonyms are auto-generated (i.e. from database) this property can serve as a courtesy notification that a model element has too many synonyms. Also, in general, too many synonyms can potentially lead to a performance degradation.

        Default
        If not provided by the model the default value DFLT_MAX_ELEMENT_SYNONYMS will be used.

        JSON
        If using JSON/YAML model presentation this is set by maxSynonymThreshold property:

         {
              "maxSynonymThreshold": 1000
         }
         
        Returns:
        Maximum number of unique synonyms per model element after which either warning or error will be triggered.
        See Also:
        isMaxSynonymsThresholdError(), getMaxTotalSynonyms()
      • isMaxSynonymsThresholdError

        default boolean isMaxSynonymsThresholdError()
        Whether or not exceeding getMaxElementSynonyms() will trigger a warning log or throwing an exception. Note that throwing exception will prevent data probe from starting.

        Default
        If not provided by the model the default value DFLT_MAX_SYNONYMS_THRESHOLD_ERROR will be used.

        JSON
        If using JSON/YAML model presentation this is set by maxSynonymThresholdError property:

         {
              "maxSynonymThresholdError": true
         }
         
        Returns:
        Whether or not exceeding getMaxElementSynonyms() will trigger a warning log or throwing an exception.
        See Also:
        getMaxElementSynonyms()
      • getConversationTimeout

        default long getConversationTimeout()
        Gets timeout in ms after which the unused conversation element is automatically "forgotten".

        Just like in a normal human conversation if we talk about, say, "Chicago", and then don't mention it for certain period of time during further dialog, the conversation participants subconsciously "forget" about it and exclude it from conversation context. In other words, the term "Chicago" is no longer in conversation's short-term-memory.

        Note that both conversation timeout and depth combined define the expiration policy for the conversation management. These two properties allow to fine tune for different types of dialogs. For example, setting longer timeout and smaller depth mimics slow-moving but topic-focused conversation. Alternatively, settings shorter timeout and longer depth better supports fast-moving wide-ranging conversation that may cover multiple topics.

        Default
        If not provided by the model the default value DFLT_CONV_TIMEOUT_MS will be used.

        JSON
        If using JSON/YAML model presentation this is set by conversationTimeout property:

         {
              "conversationTimeout": 300000
         }
         
        Returns:
        Timeout in ms after which the unused conversation element is automatically "forgotten".
        See Also:
        getConversationDepth()
      • getConversationDepth

        default int getConversationDepth()
        Gets maximum number of requests after which the unused conversation element is automatically "forgotten".

        Just like in a normal human conversation if we talk about, say, "Chicago", and then don't mention it for a certain number of utterances during further dialog, the conversation participants subconsciously "forget" about it and exclude it from conversation context. In other words, the term "Chicago" is no longer in conversation's short-term-memory.

        Note that both conversation timeout and depth combined define the expiration policy for the conversation management. These two properties allow to fine tune for different types of dialogs. For example, setting longer timeout and smaller depth mimics slow-moving but topic-focused conversation. Alternatively, settings shorter timeout and longer depth better supports fast-moving wide-ranging conversation that may cover multiple topics.

        Default
        If not provided by the model the default value DFLT_CONV_DEPTH will be used.

        JSON
        If using JSON/YAML model presentation this is set by conversationDepth property:

         {
              "conversationDepth": 5
         }
         
        Returns:
        Maximum number of requests after which the unused conversation element is automatically "forgotten".
        See Also:
        getConversationTimeout()
      • getRestrictedCombinations

        default Map<String,​Set<String>> getRestrictedCombinations()
        Gets an optional map of restricted named entity combinations (linkage). Returned map is a map of entity ID to a set of other entity IDs, with each key-value pair defining the restricted combination. Restricting certain entities from being linked (or referenced) by some other entities allows to reduce "wasteful" parsing variant generation. For example, it we know that entity with ID "adjective" cannot be sorted, we can restrict it from being linked with nlpcraft:limit and nlpcraft:sort entities to reduce the amount of parsing variants being generated.

        Only the following built-in entities can be restricted (i.e., to be the keys in the returned map):

        • nlpcraft:limit
        • nlpcraft:sort
        • nlpcraft:relation
        Note that entity cannot be restricted to itself (entity ID cannot appear as key as well as a part of the value's set).

        JSON
        If using JSON/YAML model presentation this is set by restrictedCombinations property:

         {
              "restrictedCombinations": {
                  "nlpcraft:limit": ["adjective"],
                  "nlpcraft:sort": ["adjective"]
              }
         }
         
        Returns:
        Optional map of restricted named entity combinations. Can be empty but never null.
        See Also:
        NCVariant