It's been a while since I wrote an article. With the new year, I need to write some technical content to keep my momentum going. Recently, I encountered an interesting problem, so let's briefly discuss Prepared Statements in MySQL.
Introduction#
gorm is a commonly used ORM when developing with Go. Recently, while using gORM, I encountered an interesting issue. Let me describe the problem briefly.
When using the Raw
method of gORM for SQL queries, I constructed a SQL statement like this:
select * from demo where match(name) AGAINST('+?' IN BOOLEAN MODE)
When passing in parameters, it returned Error
: sql: expected 0 arguments, got 1. However, other queries like the following executed normally:
select * from demo where name = ?
At first, I thought it was an issue with the SQL concatenation module in gORM
, but after looking at the code, I found an interesting logic. gORM does not have any logic for concatenating Raw SQL
; it directly calls the standard library database/sql
in Golang for SQL processing, which in turn calls the corresponding database driver implementation. Let's take a look at the logic regarding Query in database/sql
.
func (db *DB) queryDC(ctx, txctx context.Context, dc *driverConn, releaseConn func(error), query string, args []interface{}) (*Rows, error) {
queryerCtx, ok := dc.ci.(driver.QueryerContext)
var queryer driver.Queryer
if !ok {
queryer, ok = dc.ci.(driver.Queryer)
}
if ok {
var nvdargs []driver.NamedValue
var rowsi driver.Rows
var err error
withLock(dc, func() {
nvdargs, err = driverArgsConnLocked(dc.ci, nil, args)
if err != nil {
return
}
rowsi, err = ctxDriverQuery(ctx, queryerCtx, queryer, query, nvdargs)
})
if err != driver.ErrSkip {
if err != nil {
releaseConn(err)
return nil, err
}
// Note: ownership of dc passes to the *Rows, to be freed
// with releaseConn.
rows := &Rows{
dc: dc,
releaseConn: releaseConn,
rowsi: rowsi,
}
rows.initContextClose(ctx, txctx)
return rows, nil
}
}
var si driver.Stmt
var err error
withLock(dc, func() {
// The interesting part
si, err = ctxDriverPrepare(ctx, dc.ci, query)
})
if err != nil {
releaseConn(err)
return nil, err
}
ds := &driverStmt{Locker: dc, si: si}
rowsi, err := rowsiFromStatement(ctx, dc.ci, ds, args...)
if err != nil {
ds.Close()
releaseConn(err)
return nil, err
}
// Note: ownership of ci passes to the *Rows, to be freed
// with releaseConn.
rows := &Rows{
dc: dc,
releaseConn: releaseConn,
rowsi: rowsi,
closeStmt: ds,
}
rows.initContextClose(ctx, txctx)
return rows, nil
}
}
When executing the QueryDC logic in database/sql
, it calls the ctxDriverPrepare
method for SQL Query preprocessing. Let's take a look at this logic.
func ctxDriverPrepare(ctx context.Context, ci driver.Conn, query string) (driver.Stmt, error) {
if ciCtx, is := ci.(driver.ConnPrepareContext); is {
return ciCtx.PrepareContext(ctx, query)
}
si, err := ci.Prepare(query)
if err == nil {
select {
default:
case <-ctx.Done():
si.Close()
return nil, ctx.Err()
}
}
return si, err
}
In this, ctxDriverPrepare
calls ci.Prepare(query)
to execute the corresponding SQL Driver implementation's Prepare
or PrepareContext
method for SQL preprocessing. In go-mysql-driver, the corresponding implementation looks like this:
func (mc *mysqlConn) PrepareContext(ctx context.Context, query string) (driver.Stmt, error) {
if err := mc.watchCancel(ctx); err != nil {
return nil, err;
}
stmt, err := mc.Prepare(query)
mc.finish()
if err != nil {
return nil, err;
}
select {
default:
case <-ctx.Done():
stmt.Close()
return nil, ctx.Err();
}
return stmt, nil;
}
This logic in go-mysql-driver
initiates a prepared statement
request to MySQL, retrieves the corresponding Stmt
, and returns it.
The stmt
contains information such as the number of parameters and stmt name
. Here, the SQL will parse the ?
parameter placeholder and inform the client of the number of parameters needed.
The problem lies here. Let's revisit the previous SQL:
select * from demo where match(name) AGAINST('+?' IN BOOLEAN MODE)
Here, I used Full Text Match supported in MySQL 5.7. The string +?
to be matched will be parsed by MySQL as a string to be queried, rather than as a placeholder. Therefore, the returned stmt
indicates that the number of parameters needed is 0, and database/sql
will match the parameters we pass in with the required number of parameters. If they do not match, it will throw an Error
.
Now that we've identified the problem, what exactly is a Prepared Statement
, and why do we need it?
Prepared Statement#
What is a Prepared Statement?#
The general content has been discussed earlier, so let's review: A Prepared Statement
is a mechanism in MySQL (similar mechanisms exist in others like PGSQL) for preprocessing SQL, separating SQL from query data to ensure the robustness of the program.
According to the official MySQL introduction, Prepared Statements have the following benefits:
- Less overhead for parsing the statement each time it is executed. Typically, database applications process large volumes of almost-identical statements, with only changes to literal or variable values in clauses such as
WHERE
for queries and deletes,SET
for updates, andVALUES
for inserts.- Protection against SQL injection attacks. The parameter values can contain unescaped SQL quote and delimiter characters.
In short:
- Improved performance by avoiding the overhead of repeated SQL parsing.
- Prevention of SQL injection.
MySQL's Prepared Statement
can be used in two ways: one is using the binary Prepared Protocol
(which is not the focus of today's article; I will write another article about some binary protocols in MySQL), and the other is using SQL for processing.
In Prepared Statement
, there are three commands:
PREPARE
to create aPrepared Statement
.EXECUTE
to execute aPrepared Statement
.DEALLOCATE PREPARE
to destroy aPrepared Statement
.
It is important to note that Prepared Statements
have session limitations; generally, a Prepared Statement
only lives within the session it was created. When the connection is closed or in other cases when the session becomes invalid, the Prepared Statement
is automatically destroyed.
Next, let's experiment with it.
How to Use Prepared Statement#
First, let's create a test table.
create table if not exists `user`
(
`id` bigint(20) not null auto_increment,
`name` varchar(255) not null,
primary key (`id`)
) engine = InnoDB
charset = 'utf8mb4';
Then insert data.
insert into user (`name`) values ('abc');
Now, let's query using the traditional method.
select *
from user
where name = 'abc';
Now, let's use Prepared Statement
.
First, use the Prepared
keyword to create a statement
.
set @s = 'select * from user where name=?';
PREPARE demo1 from @s;
Then use the Execute
keyword to execute the Statement
.
set @a = 'abc';
EXECUTE demo1 using @a;
It's quite simple, right?
Why Use Prepared Statement?#
One important reason is to avoid SQL Injection Attacks
. The question is, why can Prepared Statements
prevent SQL injection?
It's quite simple; we separate Query
and Data
.
Using the previous table as an example, without manually handling SQL and parameters, we often use string concatenation, which can construct illegal SQL using SQL syntax. For example, in Python:
b = "'abc';drop table user"
a = f"select * from user where name={b}"
This code will generate the following SQL:
select * from user where name='abc';drop table user
Well, that's a disaster.
Now, let's see how it works with Prepared Statement
.
set @a = '\'abc\';drop table user';
EXECUTE demo1 using @a;
The final executed statement is:
select * from user where name='\'abc\';drop table user'
Since we structurally separated the Query from Query Params, no matter what we input, it will be treated as part of the Query Params, thus avoiding the risk of injection.
Advantages and Disadvantages of Prepared Statement#
The benefits are obvious:
- The database caches
Prepared Statements
, eliminating the overhead of repeated SQL processing on the client side. - Prevention of
SQL Injection Attacks
. - Clear semantics.
However, there are also several drawbacks:
- The binary protocol of
Prepared Statements
may have compatibility issues with client applications; some language clients may not support the binary protocol forPrepared Statements
. - Because there are two communications with the database, in cases of intensive SQL queries, there may be I/O bottlenecks.
So, it really depends on the scenario for trade-offs.
Final Thoughts#
I wrote this article on the plane as a new beginning for the new year. I hope to write more articles, regulate my schedule, and take good care of my girlfriend. By the way, through some recent experiences (like parsing Binlog), I suddenly realized that MySQL is a treasure trove. I will write a few articles about the pitfalls and interesting aspects of Binlog
and Protocol
in MySQL (yes, Flag ++, please don't rush me for the articles (run away)).
Alright, that's it for tonight. The plane is about to land, so I need to close my laptop (run away).