OutOfMemory Error after sendExtResponse

Post here your questions about SFS2X. Here we discuss all server-side matters. For client API questions see the dedicated forums.

Moderators: Lapo, Bax

Post Reply
hamish
Posts: 6
Joined: 13 Dec 2010, 16:49
Location: UK

OutOfMemory Error after sendExtResponse

Post by hamish »

Hello, we are dealing with a problem in the middle ware that lead us to restart the service.

There are usually around 300 users online at a time almost all the time, specially on the weekends. Recently we added a feature which incremented the network traffic considerably, since the users can be now in more than one room at a time.

It seems to be a problem related with the memory since we are getting errors like this:

Code: Select all

02 Dec 2012 21:03:27,663 WARN  v2.controllers.ExtensionController - 
java.lang.OutOfMemoryError
	java.util.zip.Deflater.init(Native Method)
	java.util.zip.Deflater.<init>(Unknown Source)
	java.util.zip.Deflater.<init>(Unknown Source)
	com.smartfoxserver.v2.protocol.binary.DefaultPacketCompressor.compress(DefaultPacketCompressor.java:38)
	com.smartfoxserver.v2.protocol.binary.BinaryIoHandler.handleWrite(BinaryIoHandler.java:99)
	com.smartfoxserver.v2.protocol.SFSIoHandler.onDataWrite(SFSIoHandler.java:326)
	com.smartfoxserver.v2.protocol.SFSProtocolCodec.onPacketWrite(SFSProtocolCodec.java:158)
	com.smartfoxserver.bitswarm.core.BitSwarmEngine.writeToSocket(BitSwarmEngine.java:392)
	com.smartfoxserver.bitswarm.core.BitSwarmEngine.write(BitSwarmEngine.java:386)
	[b]com.smartfoxserver.bitswarm.io.Response.write(Response.java:70)
	com.smartfoxserver.v2.api.response.SFSResponseApi.sendExtResponse(SFSResponseApi.java:86)
	com.smartfoxserver.v2.api.SFSApi.sendExtensionResponse(SFSApi.java:1321)[/b]
	com.xxxxxxxxx.server.room.JoinRoomClass.joinRoom(JoinRoomClass.java:573)
	com.xxxxxxxxx.server.room.JoinRoomClass.handleClientRequest(JoinRoomClass.java:643)
	com.smartfoxserver.v2.extensions.SFSExtension.handleClientRequest(SFSExtension.java:192)
	com.smartfoxserver.v2.controllers.ExtensionController.processRequest(ExtensionController.java:133)
	com.smartfoxserver.bitswarm.controllers.AbstractController.run(AbstractController.java:96)
	java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
	java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	java.lang.Thread.run(Unknown Source)

I can replicate in the dev environment an outOfMemoryError limiting the heap space but then the error is

Code: Select all

03 Dec 2012 18:04:35,406 WARN  v2.controllers.ExtensionController - 
java.lang.OutOfMemoryError: Java heap space

The log says that the error happens when is trying to send an extension response. Actually is always the same line of code the one that triggers this error. After this the players who are playing can keep on doing it but it doesn't accept new loginRequests.

We've got a long queue for Extension Requests and for outgoing messages. My first thought was that the server was overloaded due to the server has got just a single core CPU and it just accepts 50 database connections. Probably there is a bottle neck that collapse the queue and a it triggers the memory error. The VM max memory is 1064 MB, what makes me hard to believe that we are saturating the memory. On the other hand I reckon that if the system cannot deal with the request and they keep on being received, the queue could grow really dangerously.

We incremented the number of threads listening to Extensions to 1000 to be sure that they could deal with the requests, but probably is not a good idea in a single-core CPU, I mean, I don't know if it is pointless. Maybe we should just improve the CPU.


After all this have been said, I've got two doubts:

What happens when smartFoxServer has got a memory error kind of thing? Could I read about that in any link. We use a Singleton object to store the game info, I've read the we should be using SFSExtension instead to maintain the common data accessible. Maybe this is causing that some players are playing in a different instance of the application and that's why the ones who were playing they can play after the failure.

Any idea about why the memory error is received after a sendExtensionResponse? Does it sound like a tunning problem?

Needless to say, any help would be more than welcome. :)
User avatar
Lapo
Site Admin
Posts: 23438
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: OutOfMemory Error after sendExtResponse

Post by Lapo »

The OutOfMemory Error is non recoverable error, it means that the whole server is dead :(
We incremented the number of threads listening to Extensions to 1000 to be sure that they could deal with the requests, but probably is not a good idea in a single-core CPU, I mean, I don't know if it is pointless. Maybe we should just improve the CPU.
Sorry, this is a very bad idea. It will just degrade the performance and won't solve the problem.

If you can reproduce the problem with one line of code, please let us know what you are doing.

thanks
Lapo
--
gotoAndPlay()
...addicted to flash games
Cel
Posts: 13
Joined: 03 Dec 2012, 15:31

Re: OutOfMemory Error after sendExtResponse

Post by Cel »

Thanks Lapo!

We decreased the number of threads and the OutOfMemory error didn't appear again but the server still fails. It has got peaks of network traffic in which the whole system seems to be frozen. We have seen 3.2 Mb/s. Then the CPU usage goes to almost 100% for a while and the system doesn't recover. It starts to disconnect everyone like this:

Code: Select all

06 Dec 2012 22:04:53,746 INFO  bitswarm.sessions.DefaultSessionManager - Session created: { Id: 376, Type: DEFAULT, Logged: No, IP: 89.241.59.9:11403 } on Server port: 9933 <---> 11403
06 Dec 2012 22:04:53,763 INFO  bitswarm.sessions.DefaultSessionManager - Session removed: { Id: 375, Type: DEFAULT, Logged: No, IP: 196.192.15.123:1030 }
06 Dec 2012 22:04:53,869 INFO  bitswarm.sessions.DefaultSessionManager - Session removed: { Id: 376, Type: DEFAULT, Logged: No, IP: 89.241.59.9:11403 }
06 Dec 2012 22:04:53,926 INFO  bitswarm.sessions.DefaultSessionManager - Session created: { Id: 377, Type: DEFAULT, Logged: No, IP: 90.217.166.111:52273 } on Server port: 9933 <---> 52273
06 Dec 2012 22:04:53,997 INFO  bitswarm.sessions.DefaultSessionManager - Session created: { Id: 378, Type: DEFAULT, Logged: No, IP: 89.241.59.9:11404 } on Server port: 9933 <---> 11404
06 Dec 2012 22:04:54,062 INFO  bitswarm.sessions.DefaultSessionManager - Session removed: { Id: 377, Type: DEFAULT, Logged: No, IP: 90.217.166.111:52273 }
06 Dec 2012 22:04:54,086 INFO  bitswarm.sessions.DefaultSessionManager - Session created: { Id: 379, Type: DEFAULT, Logged: No, IP: 196.192.15.123:2168 } on Server port: 9933 <---> 2168
06 Dec 2012 22:04:54,121 INFO  bitswarm.sessions.DefaultSessionManager - Session removed: { Id: 318, Type: DEFAULT, Logged: No, IP: 2.98.218.35:50985 }
06 Dec 2012 22:04:54,190 INFO  bitswarm.sessions.DefaultSessionManager - Session created: { Id: 380, Type: DEFAULT, Logged: No, IP: 90.217.166.111:52274 } on Server port: 9933 <---> 52274
06 Dec 2012 22:04:54,237 INFO  bitswarm.sessions.DefaultSessionManager - Session created: { Id: 381, Type: DEFAULT, Logged: No, IP: 176.249.25.31:49905 } on Server port: 9933 <---> 49905
06 Dec 2012 22:04:54,378 INFO  bitswarm.sessions.DefaultSessionManager - Session removed: { Id: 381, Type: DEFAULT, Logged: No, IP: 176.249.25.31:49905 }
Is it possible that the CPU gets collapsed writing the responses? Could there be a bottle neck with the database if there are requests which take a lot of time? If every thread is stuck in the database access, what would happen? I mean, theoretically. I think that the system would start to accumulate request in the queue till it started to discard them. Which is what it does before crashing. Since before we had 1000 threads there could be a memory error. Now we are trying to know how much traffic generate every extension to try to improve the way we send the messages.

Any ideas?

Thank you very much!
User avatar
Lapo
Site Admin
Posts: 23438
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: OutOfMemory Error after sendExtResponse

Post by Lapo »

The CPU at 100% seem to indicate that the machine is maxed out. What kind of hardware do you use?
Are you running other services on the same machine? Like HTTP, Database etc...?
Is it possible that the CPU gets collapsed writing the responses?
Very unlikely
If every thread is stuck in the database access, what would happen? I mean, theoretically. I think that the system would start to accumulate request in the queue till it started to discard them.
Yes, if this is happening you can easily see it in the AdminTool > Dashboard > System Queues status
Which is what it does before crashing. Since before we had 1000 threads there could be a memory error. Now we are trying to know how much traffic generate every extension to try to improve the way we send the messages.
Every extension? How many are there?

Advice: threads are expensive, using a high number for safety is counterproductive. Normally, even under high traffic, 50 to 100 threads is all you should need.

thanks
Lapo
--
gotoAndPlay()
...addicted to flash games
Post Reply