Monday, March 11, 2024

Natural Language to SQL generation - Part I

Oracle Autonomous database enable you to query data using Natural language, by combining generative AI large language models (LLM) with Oracle SQL empowers you to describe what we want (declarative intent) and let the database generate the SQL query relevant to your schema.
 
Enabling this feature is a new package DBMS_CLOUD_AI available in autonomous database, that enables the use of LLM for generating SQL’s from natural language prompts, this package provides the access to the user specified LLM along with the knowledge of user accessible database metadata. This enables producing runnable SQL queries applicable to the schema.  In this blogpost we will see how to get started with Select AI in the autonomous database.
 
Sign into your autonomous database instance with admin privileged account and add your user account to the ACL list and grant access to DBMS_CLOUD_AI package.
 
admin@ATP19C> grant connect,resource to rajesh identified by "*************";
 
Grant succeeded.
 
admin@ATP19C> grant execute on dbms_cloud_ai to rajesh;
 
Grant succeeded.
 
admin@ATP19C> begin
  2    dbms_network_acl_admin.append_host_ace(
  3        host => 'api.cohere.ai',
  4        ace  => xs$ace_type(privilege_list => xs$name_list('http'),
  5                            principal_name => 'RAJESH',
  6                            principal_type => xs_acl.ptype_db));
  7  end;
  8  /
 
PL/SQL procedure successfully completed.
 
admin@ATP19C>
 
for this setup I will be using the Cohere as “AI Provider”. Although openAI an Azure openAI also currently supported in Autonomous database.
 
To enable Cohere to generate SQL from natural language prompts, obtain API keys from Cohere free Trial account (https://dashboard.cohere.com/) and click on the Dashboard, and click the API keys on the left navigation.
 
 
Then create a database credential to your AI provider account, in case of Cohere, the password is uniquely generated token for API usage.



 
admin@ATP19C> conn rajesh/"*************"@atp19c_vpn
Connected.
rajesh@ATP19C>
rajesh@ATP19C> begin
  2     dbms_cloud.create_credential(
  3             credential_name => 'MY_AI_CRED',
  4             username => <your_signin_user_account_for_cohere>,
  5             password => <your_uniquely_generated_token_for_API_usage>);
  6  end;
  7  /
 
PL/SQL procedure successfully completed.
 
Then, create your AI profile, with the list of schemas to consider and optionally any tables.
 
rajesh@ATP19C> begin
  2   dbms_cloud_ai.create_profile( 'MY_AI_PROFILE',
  3              '{"provider": "cohere",
  4             "credential_name": "MY_AI_CRED",
  5             "object_list": [
  6                                               {"owner":"SH","name":"PROMOTIONS"},
  7                                               {"owner":"SH","name":"PRODUCTS"},
  8                                               {"owner":"SH","name":"CUSTOMERS"},
  9                                               {"owner":"SH","name":"TIMES"},
 10                                               {"owner":"SH","name":"COUNTRIES"},
 11                                               {"owner":"SH","name":"COSTS"},
 12                                               {"owner":"SH","name":"CHANNELS"},
 13                                               {"owner":"SH","name":"SALES"},
 14                                               {"owner":"SH","name":"SUPPLEMENTARY_DEMOGRAPHICS"}
 15                                             ]       }' );
 16  end;
 17  /
 
PL/SQL procedure successfully completed.
 
Finally, we need to set our AI profile for each database session.
 
rajesh@ATP19C> exec dbms_cloud_ai.set_profile('MY_AI_PROFILE');
 
PL/SQL procedure successfully completed.
 
Let’s look at few examples, we will start simply. How many customers are there ? using the keyword AI we run the SQL commands, we see the result set as well as  SQL query generated by our LLM adding the keyword “showsql” after the “Select AI”
 
rajesh@ATP19C> Select AI how many customers are there;
 
CUSTOMER_COUNT
--------------
         55500
 
rajesh@ATP19C> Select AI showsql how many customers are there;
 
RESPONSE
---------------------------------------------------
SELECT COUNT(*) AS customer_count
FROM SH.CUSTOMERS
 
The next query highlights using the LLM broader knowledge. Find the customers that are in India.
 
rajesh@ATP19C> Select AI how many customers are there in the country India;
 
CUSTOMER_COUNT
--------------
             0
 
rajesh@ATP19C> Select AI showsql how many customers are there in the country India;
 
RESPONSE
---------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT COUNT(*) AS customer_count
FROM SH.CUSTOMERS c
JOIN SH.COUNTRIES country ON c.COUNTRY_ID = country.COUNTRY_ID
WHERE country.COUNTRY_NAME = 'India'
 
 
You can use the chat option to interact with the LLM as you would a chatbot with a single message. In the following example, a developer may need to create a table. By describing the table you want, it generates a CREATE TABLE statement you can use directly or modify to suite your needs
 
 
rajesh@ATP19C> Select AI chat show the create table statement for a table called agents with agent name and age;
 
RESPONSE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Here is the SQL code to create a table called `Agents` with columns for agent name and age:
```sql
CREATE TABLE Agents (
    agent_name VARCHAR(255) NOT NULL,
    age INT NOT NULL
);
```
 
This code creates a table with a `VARCHAR` column for the agent's name and an `INT` column for the agent's age. The `NOT NULL` constraint prevents from inserting null values in these columns.
 
You can add more columns as needed, such as an ID column for uniqueness and indexing for faster lookup.
 
Once the table is created, you can add data to it using SQL insert statements or other data manipulation techniques.
 
Even a complex sql like this
 
rajesh@ATP19C> Select AI what is Top 3 selling products;
 
PROD_NAME                                          SUM(AMOUNT_SOLD)       RANK
-------------------------------------------------- ---------------- ----------
Envoy Ambassador                                         15011642.5          1
Mini DV Camcorder with 3.5" Swivel LCD                    8314815.4          2
17" LCD w/built-in HDTV Tuner                            7189171.77          3
 
rajesh@ATP19C> Select AI showsql what is Top 3 selling products;
 
RESPONSE
---------------------------------------------------------------------------------
SELECT p.prod_name, SUM(amount_sold), RANK() OVER (ORDER BY SUM(amount_sold) DESC) as rank
FROM sh.sales, sh.products p
WHERE sales.prod_id = p.prod_id
GROUP BY p.prod_name
ORDER BY rank
FETCH FIRST 3 ROWS ONLY
 
You can also use ‘chat’ to understand what a query is doing, as shown below. 
 
rajesh@ATP19C> select AI chat
  2  with rws as (
  3  select amount_sold , prod_category, cust_gender
  4  from sh.sales
  5  join sh.products using (prod_id)
  6  join sh.customers using (cust_id)
  7  group by prod_category, cust_gender )
  8  select *
  9  from rws
 10  pivot( count(*) for cust_gender in ('F','M') )
 11  /
 
RESPONSE
-----------------------------------------------------------------------------------
 
Sorry, unfortunately a valid SELECT statement could not be generated for your natural language prompt. Here is some more information to help you further:
 
Here is the output after executing the SQL query on the given table and subquery:
```sql
with rws as (
    select amount_sold, prod_category, cust_gender
    from sh.sales
    join sh.products using (prod_id)
    join sh.customers using (cust_id)
    group by prod_category, cust_gender
)
select *
from rws
pivot(count(*) for cust_gender in ('F', 'M'))
```
| ProdCategory | AmountSold | CustGender | SalesCount |
| ------------ | ----------- | ----------- | ----------- |
| Food        | 1245.78     | M         | 262 |
| Food        | 3054.68     | M         | 390 |
| Food        | 1434.34     | M         | 337 |
| Food        | 738.28      | M         | 165 |
| Total Foods | 6263.88     | M         | 1030 |
| Dresses     | 4699.66     | F         | 704 |
| Dresses     | 3789.84     | F         | 557 |
| Dresses     | 497.40      | F         | 83 |
| Total Dresses| 8972.86     | F         | 1344 |
| Total       | 15226.74    | M/F        | 2237 |
 
The pivot table summarizes the amount sold and the number of sales according to the product category and customer gender. The amounts are aggregated with the `count` function and categorized into 'M' and 'F' for male and female customer genders.  The `total` row is a compilation of the aggregated am
ounts and gender counts.
 
Are there any other outputs you would like to know? Feel free to ask and I'll be more than happy to assist you!
 
 
The ability to generate SQL queries using natural language to get useful results sounds amazing, you can get the ease of use without SQL expertise, since this feature is integrated with database because it is SQL it is readily available with analytics and applications.
 
Finally for the stateless environment like Database actions SQL worksheet in Autonomous database, we can use the DBMS_CLOUD_AI.GENERATE functions in our sql query, since this allows to specify the AI profile directly for stateless invocations.
 
rajesh@ATP19C> col response for a80
rajesh@ATP19C> select dbms_cloud_ai.generate(prompt=>'how many customers are there',
  2             profile_name=>'MY_AI_PROFILE',
  3             action=>'showsql')  as response
  4  from dual;
 
RESPONSE
--------------------------------------------------------------------------------
SELECT COUNT(*) AS customer_count
FROM SH.CUSTOMERS


No comments:

Post a Comment