A Brief Discussion on Prepared Statements in SQL

It's been a while since I wrote an article. With the new year, I need to write some technical content to keep my momentum going. Recently, I encountered an interesting problem, so let's briefly discuss Prepared Statements in MySQL.

Introduction#

gorm is a commonly used ORM when developing with Go. Recently, while using gORM, I encountered an interesting issue. Let me describe the problem briefly.

When using the Raw method of gORM for SQL queries, I constructed a SQL statement like this:

select * from demo where match(name) AGAINST('+?' IN BOOLEAN MODE)

When passing in parameters, it returned Error: sql: expected 0 arguments, got 1. However, other queries like the following executed normally:

select * from demo where name = ?

At first, I thought it was an issue with the SQL concatenation module in gORM, but after looking at the code, I found an interesting logic. gORM does not have any logic for concatenating Raw SQL; it directly calls the standard library database/sql in Golang for SQL processing, which in turn calls the corresponding database driver implementation. Let's take a look at the logic regarding Query in database/sql.

func (db *DB) queryDC(ctx, txctx context.Context, dc *driverConn, releaseConn func(error), query string, args []interface{}) (*Rows, error) {
	queryerCtx, ok := dc.ci.(driver.QueryerContext)
	var queryer driver.Queryer
	if !ok {
		queryer, ok = dc.ci.(driver.Queryer)
	}
	if ok {
		var nvdargs []driver.NamedValue
		var rowsi driver.Rows
		var err error
		withLock(dc, func() {
			nvdargs, err = driverArgsConnLocked(dc.ci, nil, args)
			if err != nil {
				return
			}
			rowsi, err = ctxDriverQuery(ctx, queryerCtx, queryer, query, nvdargs)
		})
		if err != driver.ErrSkip {
			if err != nil {
				releaseConn(err)
				return nil, err
			}
			// Note: ownership of dc passes to the *Rows, to be freed
			// with releaseConn.
			rows := &Rows{
				dc:          dc,
				releaseConn: releaseConn,
				rowsi:       rowsi,
			}
			rows.initContextClose(ctx, txctx)
			return rows, nil
		}
	}

	var si driver.Stmt
	var err error
	withLock(dc, func() {
        // The interesting part
		si, err = ctxDriverPrepare(ctx, dc.ci, query)
	})
	if err != nil {
		releaseConn(err)
		return nil, err
	}

	ds := &driverStmt{Locker: dc, si: si}
	rowsi, err := rowsiFromStatement(ctx, dc.ci, ds, args...)
	if err != nil {
		ds.Close()
		releaseConn(err)
		return nil, err
	}

	// Note: ownership of ci passes to the *Rows, to be freed
	// with releaseConn.
	rows := &Rows{
		dc:          dc,
		releaseConn: releaseConn,
		rowsi:       rowsi,
		closeStmt:   ds,
	}
	rows.initContextClose(ctx, txctx)
	return rows, nil
}
}

When executing the QueryDC logic in database/sql, it calls the ctxDriverPrepare method for SQL Query preprocessing. Let's take a look at this logic.

func ctxDriverPrepare(ctx context.Context, ci driver.Conn, query string) (driver.Stmt, error) {
	if ciCtx, is := ci.(driver.ConnPrepareContext); is {
		return ciCtx.PrepareContext(ctx, query)
	}
	si, err := ci.Prepare(query)
	if err == nil {
		select {
		default:
		case <-ctx.Done():
			si.Close()
			return nil, ctx.Err()
		}
	}
	return si, err
}

In this, ctxDriverPrepare calls ci.Prepare(query) to execute the corresponding SQL Driver implementation's Prepare or PrepareContext method for SQL preprocessing. In go-mysql-driver, the corresponding implementation looks like this:

func (mc *mysqlConn) PrepareContext(ctx context.Context, query string) (driver.Stmt, error) {
	if err := mc.watchCancel(ctx); err != nil {
		return nil, err;
	}

	stmt, err := mc.Prepare(query)
	mc.finish()
	if err != nil {
		return nil, err;
	}

	select {
	default:
	case <-ctx.Done():
		stmt.Close()
		return nil, ctx.Err();
	}
	return stmt, nil;
}

This logic in go-mysql-driver initiates a prepared statement request to MySQL, retrieves the corresponding Stmt, and returns it.

The stmt contains information such as the number of parameters and stmt name. Here, the SQL will parse the ? parameter placeholder and inform the client of the number of parameters needed.

The problem lies here. Let's revisit the previous SQL:

select * from demo where match(name) AGAINST('+?' IN BOOLEAN MODE)

Here, I used Full Text Match supported in MySQL 5.7. The string +? to be matched will be parsed by MySQL as a string to be queried, rather than as a placeholder. Therefore, the returned stmt indicates that the number of parameters needed is 0, and database/sql will match the parameters we pass in with the required number of parameters. If they do not match, it will throw an Error.

Now that we've identified the problem, what exactly is a Prepared Statement, and why do we need it?

Prepared Statement#

What is a Prepared Statement?#

The general content has been discussed earlier, so let's review: A Prepared Statement is a mechanism in MySQL (similar mechanisms exist in others like PGSQL) for preprocessing SQL, separating SQL from query data to ensure the robustness of the program.

According to the official MySQL introduction, Prepared Statements have the following benefits:

Less overhead for parsing the statement each time it is executed. Typically, database applications process large volumes of almost-identical statements, with only changes to literal or variable values in clauses such as WHERE for queries and deletes, SET for updates, and VALUES for inserts.

Protection against SQL injection attacks. The parameter values can contain unescaped SQL quote and delimiter characters.

In short:

Improved performance by avoiding the overhead of repeated SQL parsing.
Prevention of SQL injection.

MySQL's Prepared Statement can be used in two ways: one is using the binary Prepared Protocol (which is not the focus of today's article; I will write another article about some binary protocols in MySQL), and the other is using SQL for processing.

In Prepared Statement, there are three commands:

PREPARE to create a Prepared Statement.
EXECUTE to execute a Prepared Statement.
DEALLOCATE PREPARE to destroy a Prepared Statement.

It is important to note that Prepared Statements have session limitations; generally, a Prepared Statement only lives within the session it was created. When the connection is closed or in other cases when the session becomes invalid, the Prepared Statement is automatically destroyed.

Next, let's experiment with it.

How to Use Prepared Statement#

First, let's create a test table.

create table if not exists `user`
(
    `id`   bigint(20)   not null auto_increment,
    `name` varchar(255) not null,
    primary key (`id`)
) engine = InnoDB
  charset = 'utf8mb4';

Then insert data.

insert into user (`name`) values ('abc');

Now, let's query using the traditional method.

select *
from user
where name = 'abc';

Now, let's use Prepared Statement.

First, use the Prepared keyword to create a statement.

set @s = 'select * from user where name=?';

PREPARE demo1 from @s;

Then use the Execute keyword to execute the Statement.

set @a = 'abc';

EXECUTE demo1 using @a;

It's quite simple, right?

Why Use Prepared Statement?#

One important reason is to avoid SQL Injection Attacks. The question is, why can Prepared Statements prevent SQL injection?

It's quite simple; we separate Query and Data.

Using the previous table as an example, without manually handling SQL and parameters, we often use string concatenation, which can construct illegal SQL using SQL syntax. For example, in Python:

b = "'abc';drop table user"
a = f"select * from user where name={b}"

This code will generate the following SQL:

select * from user where name='abc';drop table user

Well, that's a disaster.

Now, let's see how it works with Prepared Statement.

set @a = '\'abc\';drop table user';

EXECUTE demo1 using @a;

The final executed statement is:

select * from user where name='\'abc\';drop table user'

Since we structurally separated the Query from Query Params, no matter what we input, it will be treated as part of the Query Params, thus avoiding the risk of injection.

Advantages and Disadvantages of Prepared Statement#

The benefits are obvious:

The database caches Prepared Statements, eliminating the overhead of repeated SQL processing on the client side.
Prevention of SQL Injection Attacks.
Clear semantics.

However, there are also several drawbacks:

The binary protocol of Prepared Statements may have compatibility issues with client applications; some language clients may not support the binary protocol for Prepared Statements.
Because there are two communications with the database, in cases of intensive SQL queries, there may be I/O bottlenecks.

So, it really depends on the scenario for trade-offs.

Final Thoughts#

I wrote this article on the plane as a new beginning for the new year. I hope to write more articles, regulate my schedule, and take good care of my girlfriend. By the way, through some recent experiences (like parsing Binlog), I suddenly realized that MySQL is a treasure trove. I will write a few articles about the pitfalls and interesting aspects of Binlog and Protocol in MySQL (yes, Flag ++, please don't rush me for the articles (run away)).

Alright, that's it for tonight. The plane is about to land, so I need to close my laptop (run away).