Thursday, January 10, 2008

Back 2 basics : JDBC Re-visited

We deal with numerous high level frameworks and abstractions in our day to day jobs, an sometimes we forget the basic basics. The little things that matter much. I found myself curling up with a think book only to spend hours, gain little and forget soon. So I started jotting down the stuff that actually make sense and pointers to brush the memory. No fluff ... just stuff.

Types of drivers
  • Type 1 : JDBC:ODBC bridge. These are to be used as a temporary solution to connect to databases and are inherently un-scalable and non-performant. they may also be feature limited.
  • Type 2 : Native API. These Drivers depend on native libraries to communicate with the DB. They are basically a thin Java wrapper around the Native code using JNI. They perform better than Type 1 but the portability is a problem as the native library may not be available for certain platforms.
  • Type3 : Net Protocol, they are fully Java drivers that expose the DB operations in a net friendly way. They also perform well and the portability is not much of an issue since there is no client side installation.
  • Type4 : thin drivers. These are 100% Java drivers and communicate with the DB using the DB's native protocol. They perform the best nowadays and being 100% Java they are very portable.
Choosing a driver depends on the situation the applications scalability requirements, cost and performance expectations. as a rule of thumb, Type 4 drivers are the default choice. as nowadays most DB vendors provide one for free and they perform really well.

DriverManager and Connections

You can register a driver by
  • Creating an instance of the Driver (impl) class , and using DriverManager.registerDriver()
  • Setting the property jdbc.drivers (pass -Djdbc.drivers=)
  • Loading the driver class (Class.forName()).
The last one is the most poular as this lets you keep the Driver's name as an externalized Property(eg : in a Properties file). For any DB operation you need a Connection to the DB. This can be got by simply calling DriverManager.getConnection(url, username, password).
The URL is the JDBC URL to the DB, and the user name an password have to be valid to get a connection.

Connections are limited and are heavy weight objects. Creating a connection requires several round trips to the DB and is very expensive, hence they have to be reused as much as possible so we have Connection Pooling. Always close connections. If you see unexpected SQLExceptions that complain about not having a connection, then try to trace out if you have connection leaks.

DataSources have the capability for Connection Pooling and is usually configured in a vendor specific way. DataSources are mostly accessed though a naming service like JNDI.


Application1 / app server
create a datasource using the vendors wway of doing it.
bind the DataSource to a JNDI context under a name.
Application 2 / client app
lookup the DataSource from the JNDI using the name used to bind it.
call getConnection on the DS.

Statements and ResultSets

There are 3 flavors of Statement objects and they are basically a pipeline to send an SQL statements to the DB and get results back in a ResultSet object. All the statement objects re obtained fro the Connection object.

Statement is the basic type and represents a dynamic SQL statement. Its the parent of PreparedStatement. each time you execute a Statement, the DB has to do a hard parse on the SQL and execute it. This can be poor for frequently executing SQLs. On the other hand the creation overhead for a Statement is considerably smaller than PreparedStatement.
PreparedStatement represents a pre compiled SQL statement. These statements are created with placeholders denoted by '?' where you can substitute values. the first time the PreparedStatement is executed the DB precompiles it and a hard parse is not necessary to subsequent executions. this improves the performance for frequently executed SQLs abut the creation or “preparing” overhead is more than a Statement. Some database driver like Oracle driver can use numbered placeholders than the generic '?'
The '?' are given values and they are indexed starting from 1

Callable statements are used to execute stored procedures and the they have the capability to return output parameters for the stored procedure. The in parameters are set using the setXX methods and the out parameters are registered using the registerOutParameter method. as with other types of Statements the indexes for the placeholder '?' starts at 1. INOUT parameters are handled by using the same index to set the IN parameter and registering the OUT parameter.

Remember that ResultSets are connected to their Statement objects which are connected to their Connection Objects and the ResultSet represents an open cursor in the database, so failure to close the ResultSet and the statements will lead to the DB eventually running out of cursors. Also you cannot have more than one open cursor associated with the same Connection (some drivers will throw SQLExceptions) , If you need to re use a Statement object, save the result set in a vector or a list first.

All these classes will have the execute, executeUpdate and executeQuery methods
executeUpdate is used to send a DML like INSERT, UPDATE,ALTER or EXEC. They do not return database rows, and hence the return type for this method is an int, that gives the number of rows that were affected by the SQL. in case of ALTER or EXEC , the return will always be 0, since the SQL will not affect any rows.

executeQuery is used for executing a Select statement and the selected rows are returned in a ResultSet.
The execute method is generic and can be used for both types of SQL. But execute is slower than the other two. Also execute returns a boolean , true if a ResultSet was returned and false if an update count is waiting. use getResultSet or getUpdateCount to get the relevant info. Generally its better to use the specific methods, as they perform better and require lesser coding.

The fetch size and Fetch direction can be provided to a statement to optimize the query, but this is db dependent, and the Driver may even choose to disregard and make decisions on its own.
a ResultSet is always pointing to “before the first row”. the first row is obtained by calling next() on the ResultSet . the call to next returns false if a row is not available
JDBC 2.0 adds the ability of random access to ResultSet . Its controlled by setting ResultSet.TYPE_FORWARD_ONLY (forward only iteration) ResultSet.TYPE_SCROLL_INSENSITIVE (is a snapshot of the results) ResultSet.TYPE_SCROLL_SENSITIVE.(is sensitive to the back end changes to data)
JDBC 2.0 defines updatable rows. to update a row, the ResultSet should be scrollable and the underlying SQL refers a single table. This is referred to as ResultSet concurrency.
ResultSet.CONCUR_READ_ONLY – read only ResultSets .
ResultSet.CONCUR_UPDATABLE – updatable ResultSets
Inserting rows are a 4 stage process that inludes a staging row.
move to the staging row ( rs.moveToInsertRow() )
update the staging row ( updateXXX )
insert the staging row ( rs.insertRow )
move to the new Row (rs.moveToCurrentRow)
JDBC 3.0 adds ResultSet holdability that determine if the DB cursor is held even after a transaction has committed or if the cursor is closed on a commit. these can be specified when a connection is created.
ResultSet.HOLD_CURSORS_OVER_COMMIT – holds cursors and the ResultSet is usable even after the commit.
ResultSet.CLOSE_CURSORS_AT_COMMIT – closes the cursorr at a commit.
extraction of data from the ResultSet is using the getXXX methods and you can either specify the column indexes(starting at 1) or column names(case in-sensitive)
The ResultSetMetaData object gives details about the table and columns, like the column count , the column names and the tale name. Always remember the resource utilization in jdbc. Result sets cache data locally and they may re-query the db if it needs to.

Transactions and Batch Processing
The most primitive way to control transactions is to use the setAutoCommit method. enabling auto commits forces a database commit after each operation. Its usually best to set this to false and deal with commits , though its more coding. It gives more control and often better performance. But remember to invoke the commit method once the db operation is done and rollback if an exception was caught.
Batch processing is essentially like transactions, but they are limited to a single Statement. use the method Statement.addBatch() and executeBatch()

RowSets are a new spec and they can be seen as ResultSets that can be used like JavaBeans. Some kinds of RowSets can be disconnected, ie, they cache all the data and they need not be backed by a live connection to the database, making them ideal to encapsulate data and pass them around.

Q&A

  • How to find the number of rows returned?
if the ResultSet is scrollable, move to the last row and call getRow, which returns the row number.
In most cases a better soln will be to issue a second SQL with a count(*).
ResultSet navigation does not seem to work.
if using JDBC 2.0 navigation methods , make sure that the driver supports the methods.

  • Why is it unsafe to pass Resultset objects around?
the ResultSet objects are backed by the Statement and Connection objects. if they go out of scope or get closed, the ResultSet cannot get access the data., except for the data it cached. Also since they represent open cursors on the database, passing the ResultSet objects around causes the database to run out of cursors.

  • What are invalid cursor state error messages?
you get these when the ResultSet is out of a valid range., like trying to get data from a ResultSet before calling next(), or when the ResultSet has scrolled beyond last row.

  • How to handle nulls?
database nulls are different from the Java 'null' value. DB nulls for an INTEGER column may be converted to 0. To properly handle this, get the data with the getXXX method and then invoke the wasNull() method on the ResultSet. The method works only after the data has been extracted from the ResultSet.

  • How to handle BLOBS and CLOBS ?
BLOB requires java.sql.Blob but CLOB is provided by java.sql.Clob. You get data as raw bytes, with a BLOB, and characters in a CLOB
the new MySQL and Oracle 10 drivers provide the getString() method that can return the whole CLOB .

  • What is the best collection class to store ResultSet data ?
ArrayList or LinkedList.
ArrayList has a constant 'average' cost for appending to the end of the list as the backing array may sometimes require re-allocation. Linked lists have the constant cost.
insertions in random locations are better with LinkedLists as the cost is constant. ArrayLists require to shift the following elements.
ArrayLists incur space overhead in the form of reserve space at the end of the array. Linked Lists incur even more space overhead for Entry objects(this is per element).
Depending on the usage, a Map may be a relevant too.