This example demonstrates a relatively complete implementation of natural language interface to a SQL database. This is a non-trivial example, spanning over a thousand lines of code, and it can act as a basis for production-ready implementation.
Note that a significant part of the implementation is not directly related to NLPCraft but rather deals with SQL statement construction. We specifically decided not to use any 3rd party libraries for it to show what can be done "from scratch".
Many of modern existing natural language-to-SQL implementations use variations of deep learning approach where you first train the neural network on a pre-created training set and then get to use the trained network to provide (infer) probabilistic answers for the new input sentences. Although the latest natural language-to-SQL attempts to implement this approach are getting into 90% percentile of accuracy - they remain largely unusable for the vast swath of enterprise applications where non-deterministic nature of such systems renders them practically useless. The apparent problem with non-deterministic systems like that is that the user never knows whether a given answer is correct or incorrect. In these use cases users can't tolerate the fact that in 5-10% of the cases the answer will be invalid. Try that for your HR or revenue reporting system, prescriptive analytics systems, and so on...
It's important to note, however, that for many non-mission-critical systems such non-determinism does not pose a significant problem. We happily accept such imprecision when asking for direction on our mobile devices, unlock our phones using face or fingerprint recognition, when performing sentiment analysis or trying to detect faces of our friends across thousands of photographs. Cost of retries, as well as the cost of initial errors, is insignificant in these cases. The same cost, however, in many business applications can be too significant to tolerate.
As you may have learned by now, NLPCraft uses a fully deterministic approach in an attempt to match the user input against a defined set of intents. If it finds the matching intent - it guarantees that match for a given intent. If no matching intent can be found - it returns the negative result without any ambiguity. In other words, if the answer is given, it is deterministically guaranteed to be correct. Another positive side-effect of this approach is the fact that such matching logic is traceable, i.e. the user can see why given user input was matched against a certain intent (and not any other). Such traceability of the comprehension logic (or explainability vs "black box" approach from deep learning techniques) is often critical for many real-life business applications.
On the flip side, one of the downsides of such an approach in general is the need to have a detailed, domain-specific model for each SQL database (when dealing with SQL databases). Building such a model can be a non-trivial and time consuming experience. That is where NLPCraft brings a lot of built-in tooling to simplify this task dramatically.
Due to size of this example the entire source code for it can be found on GitHub.
We are going to be building natural language interface against slightly modified "Northwind" sample database from Microsoft SQL Server:
You can find SQL script creating this database and populating it with the sample data at
db/northwind.sql file in the root of the example.
In our example we are using H2 database instance. For convenience, example provides a command line application
db/SqlServer.scala that automatically starts local H2 database instance with default configuration (localhost on port 9092) and initializes it using
Creating data model manually for the existing SQL schema can be a daunting task. NLPCraft provides the tool that can scan SQL schema and create working stub of such model automatically. All you have to do then is to add any necessary modifications to it.
SQL Model Generation is a Java-based utility that takes JDBC configuration, reads database schema using it and creates initial JSON or YAML stub for the data model. This stub then can be used as is or be extended further.
Examples specifically comes with two pre-generated files:
sql_model_init.yaml- the initial file that was generated straight out of SQL Model Generator (see above for the instructions).
Implementation mainly consists of these files:
SqlModel.scala- the code behind the data model that loads YAML-defined model and defines all intents.
db/SqlBuilder.scala- the main utility that takes object model provided by SQL Model Generation and builds a SQL query to execute. Note that we elected to build this functionality from scratch to illustrate how it can be done. You are free, of course, to use many of the existing libraries to help achieve this goal.
Assuming that project is located in
~/SqlModel folder - let's go to into this directory and run the Maven build:
$ cd ~/SqlModel $ mvn clean package
At this stage we have our project built and we are ready to start testing.
Run the following command to start local REST server, if it hasn't been started already, from the NLPCraft installation directory:
$ bin/nlpcraft.sh start-server
bin/nlpcraft.sh help --cmd=start-serverto get a full help on this command.
Part of the test framework, the auto-validator class NCTestAutoModelValidator takes one or more model IDs (or class names) and performs validation. Validation consists of starting an embedded probe with a given model, scanning for @NCIntentSample and @NCIntentSampleRef annotations and their corresponding callback methods, submitting each sample input sentences from these annotations and checking that resulting intent matches the intent the sample was attached to. Note that auto-testing does not require any additional code to be written - the class gathers all required information from the model itself.
As always, you can launch model auto-validator as any other Java class but we'll use NLPCraft CLI to do it more conveniently:
$ bin/nlpcraft.sh test-model --cp=~/sql/target/classes --mdls=demo.SqlModel
bin/nlpcraft.sh help --cmd=test-modelto get a full help on this command.
retest-modelcommand in REPL mode to re-run the last model test avoiding the retyping of all required parameters.
Typical development cycle consists of:
All of these operations can be performed from NLPCraft CLI in REPL mode or from any IDE.
NOTE: you don't need to restart REST server every time - it only needs to be started once.
You've created SQL model, started the REST server and tested this model using the built-in test framework.