Using natural language to query the data is an easy
way to answer business questions. Which we have covered the basics in the previous blog post, in this post we will see about how can this work on my data when
tables and column names are meaningless? It is possible when using Autonomous
database in place. There is no magic, if the tables and column names are not
descriptive, then we can help the LLM interpret the meaning of tables and
columns using the build-in database feature called “Comments”. Yes comments are
descriptive notes about tables and column purpose or usage. And better the
comment, the more likely the LLM will know how to use the table or column to
generate the right query.
Let’s take an example, my database has three tables.
their table and column names are meaningless.
rajesh@ATP19C> create
table t1(
2 c1 number,
3 c2 varchar2(2),
4 c3 varchar2(40),
5 c4 varchar2(30),
6 c5 number,
7 c6 varchar2(20),
8 c7 number,
9 c8 varchar2(11),
10 c9 number,
11 c10 varchar2(40) );
rajesh@ATP19C> create table t2(
2 c1 number,
3 c2 varchar2(20),
4 c3 varchar2(40),
5 c4 varchar2(1),
6 c5 number,
7 c6 varchar2(20),
8 c7 varchar2(40),
9 c8 varchar2(10),
10 c9 varchar2(30),
11 c10 number ,
12 c11 varchar2(40),
13 c12 number,
14 c13 number,
15 c14 varchar2(25),
16 c15 varchar2(30),
17 c16 number,
18 c17 varchar2(50),
19 c18 varchar2(14),
20 c19 number,
21 c20 number,
22 c21 Date,
23 c22 Date,
24 c23 varchar2(1) );
rajesh@ATP19C> create table t3(
2 c1 number,
3 c2 number,
4 c3 date,
5 c4 number,
6 c5 number,
7 c6 number,
8 c7 number );
There is a zero chance, that a Natural language query
will know that these tables represent, countries , customers and sales
Information, we can fix that ambiguity by adding database comments.
rajesh@ATP19C> set
feedback off
rajesh@ATP19C> insert /*+ append */ into t1 select * from sh.countries;
rajesh@ATP19C> insert /*+ append */ into t2 select * from sh.customers;
rajesh@ATP19C> insert /*+ append */ into t3 select * from sh.sales;
rajesh@ATP19C> commit;
rajesh@ATP19C>
rajesh@ATP19C> comment on table t1 is 'contains country name, region , subregion and ISO code';
rajesh@ATP19C> comment on column t1.c1 is 'country ids. use this column to join with other tables';
rajesh@ATP19C> comment on column t1.c2 is 'country ISO code';
rajesh@ATP19C> comment on column t1.c3 is 'country name';
rajesh@ATP19C> comment on column t1.c4 is 'country subregion name';
rajesh@ATP19C> comment on column t1.c5 is 'country subregion ids';
rajesh@ATP19C> comment on column t1.c6 is 'country region name';
rajesh@ATP19C> comment on column t1.c7 is 'country region ids';
rajesh@ATP19C>
rajesh@ATP19C>
rajesh@ATP19C> comment on table t2 is 'contains customer information like name, gender, date of birth, marital status, postal address';
rajesh@ATP19C> comment on column t2.c1 is 'customer ids. primary key column use this column to join with other tables';
rajesh@ATP19C> comment on column t2.c2 is 'first name of the customer';
rajesh@ATP19C> comment on column t2.c3 is 'last name of the customer';
rajesh@ATP19C> comment on column t2.c4 is 'customer gender';
rajesh@ATP19C> comment on column t2.c5 is 'customer year of birth';
rajesh@ATP19C> comment on column t2.c6 is 'customer marital status';
rajesh@ATP19C> comment on column t2.c7 is 'customer street address';
rajesh@ATP19C> comment on column t2.c8 is 'postal code of the customer';
rajesh@ATP19C> comment on column t2.c9 is 'city where the customer lives';
rajesh@ATP19C> comment on column t2.c11 is 'customer geography: state or province';
rajesh@ATP19C> comment on column t2.c13 is 'foreign key to the countries table';
rajesh@ATP19C> comment on column t2.c14 is 'customer main phone number';
rajesh@ATP19C> comment on column t2.c15 is 'customer income level';
rajesh@ATP19C> comment on column t2.c16 is 'customer credit limit';
rajesh@ATP19C> comment on column t2.c17 is 'customer email id';
rajesh@ATP19C>
rajesh@ATP19C> comment on table t3 is 'contains sales information about each customers';
rajesh@ATP19C> comment on column t3.c2 is 'foreign key to the customer table';
rajesh@ATP19C> comment on column t3.c6 is 'product quantity sold with the transaction';
rajesh@ATP19C> comment on column t3.c7 is 'invoiced amount to the customer';
rajesh@ATP19C>
rajesh@ATP19C>
that’s it, those meaning less tables and column names
can be understood by LLM using Select AI. This Select AI “profile” encapsulates
the information needed to interact with LLM. It includes details like AI
provider, the model to use, the source tables used for natural language queries
and whether comments should be passed to the model for SQL generation etc.
rajesh@ATP19C> begin
2 dbms_cloud_ai.create_profile( 'MY_AI_PROFILE',
3 '{"provider": "cohere",
4 "credential_name": "MY_AI_CRED",
5 "comments":"true",
6 "object_list": [
7 {"owner":"RAJESH","name":"T1"},
8 {"owner":"RAJESH","name":"T2"},
9 {"owner":"RAJESH","name":"T3"}
10 ] }' );
11 end;
12 /
PL/SQL procedure successfully
completed.
Now we can start asking questions using natural
language against the complex schema, even though the tables and column names
are meaningless, the LLM is able to identify the tables and column names
through comments and generate queries.
rajesh@ATP19C> exec
dbms_cloud_ai.set_profile('MY_AI_PROFILE');
PL/SQL procedure successfully
completed.
rajesh@ATP19C> Select AI
showsql how many customers are there ;
RESPONSE
--------------------------------------------------------------------------------
SELECT COUNT(*) AS customer_count
FROM RAJESH.T2
rajesh@ATP19C> Select AI
showsql what are our total sales;
RESPONSE
--------------------------------------------------------------------------------
SELECT SUM(T3.C7) AS TotalSales
FROM T1 T1 JOIN T2 T2 ON T1.C1 = T2.C13 JOIN T3 ON T2.C1 = T3.C2
rajesh@ATP19C> Select AI
showsql list total sales by gender;
RESPONSE
--------------------------------------------------------------------------------
SELECT t3.c4, SUM(t3.c6)
FROM rajesh.t3 JOIN rajesh.t2 ON rajesh.t3.c1 = rajesh.t2.c1
GROUP BY t3.c4
ORDER BY SUM(t3.c6) DESC
rajesh@ATP19C> Select AI
showsql list customers born in the year 1980;
RESPONSE
--------------------------------------------------------------------------------
SELECT t2.C16 AS customer_credit_limit, t2.C4 AS customer_gender, t2.C6 AS custo
mer_marital_status, t2.C11 AS customer_province, t2.C9 AS customer_city, t2.C15
AS customer_income_level, t2.C5 AS customer_year_of_birth, t2.C17 AS customer_em
ail, t2.C3 AS customer_last_name, t2.C2 AS customer_first_name
FROM rajesh.t2
WHERE t2.C5 = 1980
Properly describing your data will help you use
natural language to get answers. Comments not only help an LLM successfully
formulate queries, they also help you understand your data as
well, starting with Oracle database 23c, not just comments even Annotations
can help LLM to get the best out of data models, stay tuned.
2 c1 number,
3 c2 varchar2(2),
4 c3 varchar2(40),
5 c4 varchar2(30),
6 c5 number,
7 c6 varchar2(20),
8 c7 number,
9 c8 varchar2(11),
10 c9 number,
11 c10 varchar2(40) );
rajesh@ATP19C> create table t2(
2 c1 number,
3 c2 varchar2(20),
4 c3 varchar2(40),
5 c4 varchar2(1),
6 c5 number,
7 c6 varchar2(20),
8 c7 varchar2(40),
9 c8 varchar2(10),
10 c9 varchar2(30),
11 c10 number ,
12 c11 varchar2(40),
13 c12 number,
14 c13 number,
15 c14 varchar2(25),
16 c15 varchar2(30),
17 c16 number,
18 c17 varchar2(50),
19 c18 varchar2(14),
20 c19 number,
21 c20 number,
22 c21 Date,
23 c22 Date,
24 c23 varchar2(1) );
rajesh@ATP19C> create table t3(
2 c1 number,
3 c2 number,
4 c3 date,
5 c4 number,
6 c5 number,
7 c6 number,
8 c7 number );
rajesh@ATP19C> insert /*+ append */ into t1 select * from sh.countries;
rajesh@ATP19C> insert /*+ append */ into t2 select * from sh.customers;
rajesh@ATP19C> insert /*+ append */ into t3 select * from sh.sales;
rajesh@ATP19C> commit;
rajesh@ATP19C>
rajesh@ATP19C> comment on table t1 is 'contains country name, region , subregion and ISO code';
rajesh@ATP19C> comment on column t1.c1 is 'country ids. use this column to join with other tables';
rajesh@ATP19C> comment on column t1.c2 is 'country ISO code';
rajesh@ATP19C> comment on column t1.c3 is 'country name';
rajesh@ATP19C> comment on column t1.c4 is 'country subregion name';
rajesh@ATP19C> comment on column t1.c5 is 'country subregion ids';
rajesh@ATP19C> comment on column t1.c6 is 'country region name';
rajesh@ATP19C> comment on column t1.c7 is 'country region ids';
rajesh@ATP19C>
rajesh@ATP19C>
rajesh@ATP19C> comment on table t2 is 'contains customer information like name, gender, date of birth, marital status, postal address';
rajesh@ATP19C> comment on column t2.c1 is 'customer ids. primary key column use this column to join with other tables';
rajesh@ATP19C> comment on column t2.c2 is 'first name of the customer';
rajesh@ATP19C> comment on column t2.c3 is 'last name of the customer';
rajesh@ATP19C> comment on column t2.c4 is 'customer gender';
rajesh@ATP19C> comment on column t2.c5 is 'customer year of birth';
rajesh@ATP19C> comment on column t2.c6 is 'customer marital status';
rajesh@ATP19C> comment on column t2.c7 is 'customer street address';
rajesh@ATP19C> comment on column t2.c8 is 'postal code of the customer';
rajesh@ATP19C> comment on column t2.c9 is 'city where the customer lives';
rajesh@ATP19C> comment on column t2.c11 is 'customer geography: state or province';
rajesh@ATP19C> comment on column t2.c13 is 'foreign key to the countries table';
rajesh@ATP19C> comment on column t2.c14 is 'customer main phone number';
rajesh@ATP19C> comment on column t2.c15 is 'customer income level';
rajesh@ATP19C> comment on column t2.c16 is 'customer credit limit';
rajesh@ATP19C> comment on column t2.c17 is 'customer email id';
rajesh@ATP19C>
rajesh@ATP19C> comment on table t3 is 'contains sales information about each customers';
rajesh@ATP19C> comment on column t3.c2 is 'foreign key to the customer table';
rajesh@ATP19C> comment on column t3.c6 is 'product quantity sold with the transaction';
rajesh@ATP19C> comment on column t3.c7 is 'invoiced amount to the customer';
rajesh@ATP19C>
rajesh@ATP19C>
2 dbms_cloud_ai.create_profile( 'MY_AI_PROFILE',
3 '{"provider": "cohere",
4 "credential_name": "MY_AI_CRED",
5 "comments":"true",
6 "object_list": [
7 {"owner":"RAJESH","name":"T1"},
8 {"owner":"RAJESH","name":"T2"},
9 {"owner":"RAJESH","name":"T3"}
10 ] }' );
11 end;
12 /
--------------------------------------------------------------------------------
SELECT COUNT(*) AS customer_count
FROM RAJESH.T2
--------------------------------------------------------------------------------
SELECT SUM(T3.C7) AS TotalSales
FROM T1 T1 JOIN T2 T2 ON T1.C1 = T2.C13 JOIN T3 ON T2.C1 = T3.C2
--------------------------------------------------------------------------------
SELECT t3.c4, SUM(t3.c6)
FROM rajesh.t3 JOIN rajesh.t2 ON rajesh.t3.c1 = rajesh.t2.c1
GROUP BY t3.c4
ORDER BY SUM(t3.c6) DESC
--------------------------------------------------------------------------------
SELECT t2.C16 AS customer_credit_limit, t2.C4 AS customer_gender, t2.C6 AS custo
mer_marital_status, t2.C11 AS customer_province, t2.C9 AS customer_city, t2.C15
AS customer_income_level, t2.C5 AS customer_year_of_birth, t2.C17 AS customer_em
ail, t2.C3 AS customer_last_name, t2.C2 AS customer_first_name
FROM rajesh.t2
WHERE t2.C5 = 1980