Page 1 of 2

DatabaseManager and connection pooling

Posted: 21 Apr 2009, 16:46
by icepick
Hi,
We have a zone-level extension and while it has been working fine, we have been seeing some database-related issues in the last few days.

Earlier today, our extension could not execute any queries to the database, with the following error in the SFS logs:

DbManager could not retrive a connection. org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool exhausted

Restarting smartfox fixed that, but I'm wondering why it occurred and what we can do to prevent it from happening again.

Currently we are executing queries directly through the DatabaseManager.executeQuery(sql) call, and are not using getConnection() at all. Since our extension is not (at this point) multithreaded, is there an advantage to using getConnection()? Does executeQuery() use the connection pool behind the scenes?

Not sure if this is related or not, but we have been getting quite a few socket timeouts on our database connection lately. I had initially thought that this was just a networking problem, but now I'm not sure if it's related to the above problem. The end of a traceback is pasted below.

Thanks.

Code: Select all

INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at it.gotoandplay.smartfoxserver.extensions.PythonExtension.handleRequest(PythonExtension.java:203)
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at it.gotoandplay.smartfoxserver.controllers.ExtensionHandler.processEvent(ExtensionHandler.java:539)
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at it.gotoandplay.smartfoxserver.controllers.ExtensionHandler.run(ExtensionHandler.java:344)
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at java.lang.Thread.run(Unknown Source)
INFO   | jvm 1    | 2009/04/21 06:17:06 | Caused by: java.net.SocketTimeoutException: Read timed out
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at java.net.SocketInputStream.socketRead0(Native Method)
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at java.net.SocketInputStream.read(Unknown Source)
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:135)
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:104)
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73)
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:259)
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1182)
INFO   | jvm 1    | 2009/04/21 06:17:06 | 	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:194)

Posted: 21 Apr 2009, 20:36
by icepick
Regarding the socket timeouts, I should also mention that while our database and smartfox are not on the same server, they are in the same data center (ping time is <1ms). It works without timing out 99% of the time.

Posted: 23 Apr 2009, 06:50
by Lapo
DbManager could not retrive a connection. org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool exhausted

Restarting smartfox fixed that, but I'm wondering why it occurred and what we can do to prevent it from happening again.
You can simply configure your DatabaseManager to allow more connections in the pool and let it grow (with the GROW option) when it's running out of connections.

This one indicates the max. number of active connections

Code: Select all

<MaxActive>10</MaxActive>
You are likely to have this in your config:

Code: Select all

<OnExhaustedPool>fail</OnExhaustedPool>
which will throw an exception when connections are all taken.

Check the docs for all the details:
http://www.smartfoxserver.com/docs/docP ... ection.htm

Posted: 23 Apr 2009, 14:45
by icepick
Lapo wrote:
DbManager could not retrive a connection. org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool exhausted

Restarting smartfox fixed that, but I'm wondering why it occurred and what we can do to prevent it from happening again.
You can simply configure your DatabaseManager to allow more connections in the pool and let it grow (with the GROW option) when it's running out of connections.

This one indicates the max. number of active connections

Code: Select all

<MaxActive>10</MaxActive>
You are likely to have this in your config:

Code: Select all

<OnExhaustedPool>fail</OnExhaustedPool>
which will throw an exception when connections are all taken.

Check the docs for all the details:
http://www.smartfoxserver.com/docs/docP ... ection.htm
Thanks Lapo, I did see that when I was going through the docs. I will change that setting and see if that helps. Do you think that would help with the socket timeouts, or am I likely dealing with a networking issue there?

Posted: 23 Apr 2009, 14:54
by Lapo
Yep, that will help a lot

Posted: 23 Apr 2009, 16:30
by icepick
Lapo wrote:Yep, that will help a lot
Thanks Lapo, I will try that. Your customer support is excellent, by the way!

Posted: 23 Apr 2009, 22:59
by icepick
The timeouts are persisting, although I haven't seen any "pool exhausted" messages since. Anything else you can think of to minimize or help debug the timeouts?

It may be a firewall issue as well, I am looking into that.

Posted: 24 Apr 2009, 05:42
by Lapo
You probably have an extension scalability problem due to the many database accesses.
Extensions are by default handled by a single thread, but high traffic on the database usually requires that you adjust the ExtensionHandler to work with a larger pool of threads.

http://www.smartfoxserver.com/docs/docP ... Safety.htm

Posted: 24 Apr 2009, 23:58
by icepick
Lapo wrote:You probably have an extension scalability problem due to the many database accesses.
Extensions are by default handled by a single thread, but high traffic on the database usually requires that you adjust the ExtensionHandler to work with a larger pool of threads.

http://www.smartfoxserver.com/docs/docP ... Safety.htm
I am undergoing some work to minimize database queries, although we are not doing a ton right now.

I'll check out the threading stuff as well; thanks.

Posted: 23 Jun 2009, 04:12
by icepick
Bringing this thread back from the dead... Lately these timeouts have become more frequent, although they are extremely unpredictable.

I added more logging statements to the extension and the database queries are hanging the extension for up to 14 minutes at a time! Although this happens infrequently. Checking the database server's logs, all that shows up is:

LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on client connection

There is a firewall between these servers that could be messing with the TCP connection. But what I really want to know is, is there a way to set a timeout parameter for database queries from a smartfoxserver extension? As this extension is currently single-threaded, these timeouts are hanging the entire extension.

Database: postgresql 8.3 (on a different nearby server)
Database Adapter: postgresql jdbc4 jar

Posted: 23 Jun 2009, 05:34
by Lapo
The problem is that the extension is single threaded and scalability is lost.
Since database calls can hold the extension thread for a long time (ms or even seconds) when compared to other calls (nanoseconds) you will to resize the pool of threads used by the ExtensionHandler stage of the server.

(Using the <ExtHandlerThreads> parameter in the config.xml)

Posted: 23 Jun 2009, 13:42
by icepick
Lapo wrote:The problem is that the extension is single threaded and scalability is lost.
Since database calls can hold the extension thread for a long time (ms or even seconds) when compared to other calls (nanoseconds) you will to resize the pool of threads used by the ExtensionHandler stage of the server.

(Using the <ExtHandlerThreads> parameter in the config.xml)
Hi Lapo,
Yes, the extension being single-threaded does hurt scalability, but even if it was multithreaded, it's still a problem for one of the threads to hang for minutes at a time for a database query, since at least one user will still be waiting for that query to go through. Is there no way to set a client-side timeout for a query (other than setting up a timer thread)?

Posted: 23 Jun 2009, 14:10
by Lapo
Again, I think we're not looking at the right problem.
Why would a query take so long? 1 minute? whoa :shock:
I think the problem here is the database response time not the timeout.

Is the database overloaded or are you using monster queries? :)

Posted: 23 Jun 2009, 15:19
by icepick
Lapo wrote:Again, I think we're not looking at the right problem.
Why would a query take so long? 1 minute? whoa :shock:
I think the problem here is the database response time not the timeout.

Is the database overloaded or are you using monster queries? :)
I'm pretty sure the holdup is networking/firewall related. Most of the time the query is done in less than 5ms, including network time. But it seems that if no queries are done for about an hour or so, the next query will take at least a few minutes, which is totally unacceptable. From the database's side, they see a misbehaving client (see the EOF log message posted earlier). It's not the database locking up; I've never had a problem with the web app that is also using it.

Posted: 23 Jun 2009, 15:32
by Lapo
But it seems that if no queries are done for about an hour or so, the next query will take at least a few minutes, which is totally unacceptable
Ok I see. In this case it is possible that the connection between SFS and the DB becomes stale.

In order to avoid this you can simply keep it alive from time to time by simply sending a call to the database, maybe using an inexpensive query like "SELECT 1". It can be done like once every 20 - 30 minutes.

Cheers