This domain is for sale. Contact us with an offer, if interested.

Fix Database Connection Pool Issue

Project Description

We have a Java Servlet and JSP based content management system for websites. For unknown reason, it may occasionally run out of database connections. We need help with understanding when and why it happens, and with a fix to eliminate this problem.

When it happens, website visitors cannot use a shopping cart / buy anything from websites. Therefore, the problem is critical. Restarting the application in Tomcat Manager does help to release and reacquire database connection pool, until next time the problem occurs again.

Completion Notes

Let's first try to evaluate our deployment environment.

<Resource auth="Container" driverClassName="org.firebirdsql.jdbc.FBDriver" maxActive="100" maxIdle="30" maxWait="10000" ... >

When the problem occurs, it manifests itself with the following error "Wrapped Cannot get database connection." when a website visitor tries to access a certain page in a website, such as a store page.

Looking at Logs

If we study web server logs, we can find something like below around the time the problem occurs:

SEVERE: ProductServlet: Cannot get database connection
org.apache.tomcat.dbcp.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object

Additional analysis of the logs also shows some "Client Abort" exceptions prior to the moment when we can no longer obtain a database connection from the pool. These exceptions occur when client terminates the communication socket before the application sends any response back to it. In other words, there is nobody listening for response any longer, and the request terminates abnormally via Java exception mechanism.

If we try to count such client abort exceptions prior to the event, we see about 100, which is our database connection pool size, see the "maxActive" value in the Resource definition above.

We can guess that the app is having one database connection leak when client abort occurs. Also, the connection pool resource is defined in a way that abandoned connections are not returned to the pool. So, in theory, we may be having one primary problem (a resource leak), and a secondary problem (abandoned connections are not being recovered).

If our theory is correct, then locating and fixing a resource leak would definitely help. Also, configuring connection pool to recover abandoned connections may additionally help if our resource leak fix is unsuccessful, or if there are similar leaks in other scenarios, not only client aborts.

Looking at Code

Let's now examine the code.

Let's look at the area of the code where the problem is manifested when user tries to browse a shop page. We see a processRequest function with multiple nested try - catch - finally sections. Looking at this code carefully, we see that it is structured as so:

try {
  Connection conn = getConnection();
  try {
  } finally {
catch (SQLException x) {

One potential concern is what happens if client aborts immediately after the getConnection() call but before entering the nested try block? With very accurate timing, we may have a connection leak here, it seems. However, for this to occur the abort must happen at a very narrow point in time.

Let us see what else we can do to locate the problem.

Logging Abandoned Connections

Perhaps a better way to find the problem is to log an abandoned connection info.

It turns out that we can do it be redefining or connection pool resource as so:

<Resource auth="Container"
driverClassName="org.firebirdsql.jdbc.FBDriver" logAbandoned="true"
maxActive="100" maxIdle="30" maxWait="10000"
name="jdbc/db" password="xxxxx" removeAbandoned="true"
removeAbandonedTimeout="300" type="javax.sql.DataSource"
url="jdbc:firebirdsql:/idb/xxxxx?lc_ctype=UTF8" username="xxxx"/>

Notice that we use logAbandoned, removeAbandoned, removeAbandonedTimeout, and factory attributes. The presence of the factory attribute is important if you want to see anything about connection pool connections it the logs.

A simple test for was specifying maxIdle="3" (which is less than minIdle default value), and then restarting tomcat. In this case, a warning was printed in our catalina.out file:

Sep 04, 2015 7:48:36 PM org.apache.tomcat.jdbc.pool.ConnectionPool init
WARNING: maxIdle is smaller than minIdle, setting maxIdle to: 10

So, it looks like at least it logs something at this point, but we had to use the factory attribute. As to why you need it, I am not exactly sure. Now let's wait and see if we can get a log message about abandoned connection.

Problem Located

When a connection abandonment occurs, we'll see a message in catalina.out with stack trace and line numbers, something like this:

Sep 04, 2015 6:39:31 PM org.apache.tomcat.jdbc.pool.ConnectionPool abandon
WARNING: Connection has been abandoned PooledConnection[org.firebirdsql.jdbc.FBConnection@127b900]:java.lang.Exception
        at org.apache.tomcat.jdbc.pool.ConnectionPool.getThreadDump(
        at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(
        at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(
        at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(
        at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(

Now we see whether the actual problem is. It looks like the shopping cart code does not close connections sometimes and needs to be refactored so that every openConnection() call has a corresponding close().

Connection Leak Fix

To fix the issue, we refactored all code fragments that were not properly closing a database connection.

In addition to that, the database connection pool resource definition was changed to use the "logAbandoned", "removeAbandoned", "removeAbandonedTimeout", and "factory" attributes.

Note: another tool one can use to diagnose connection pools is JConsole. The idea is that you set it up for use on the server with a problematic app, and when the problem occurs, you can use the tool to see how many connections are in use. That was our next thing to try in case we would not catch the problem earlier with the "logAbandoned" and "factory" resource attributes. Fortunately, we did see a log message (once), which immediately explained the problem.