Page 1 of 1

Ghost Rooms Causing Join Room Failures

Posted: 11 Aug 2009, 14:53
by jjduran
Hello,

Our game has grown rapidly over the past few months, and we're nearly ready to upgrade to the unlimited license as we have several hundred concurrent users at this point. But along with more users comes more problems.

We've encountered a recurring problem regarding joining a room. In our system, battle rooms (game rooms) are created and destroyed rapidly.

After adding several traces, I've discovered that some rooms are "getting stuck" and are never destroyed. The even more interesting thing is that these rooms DO NOT display within the SmartFox AdminTool.

These "Ghost Rooms" are leading to problems because when players in our system attempt to search for a battle, they are often matched to these invisible rooms that seemingly no longer exist.

Here is how I've confirmed that "Ghost Rooms" exist. When a user attempts to find a battle in our game, a request is sent to the server to find an available room on the serverside. To get a list of all available rooms on the server, I call:

Code: Select all

var allRooms = zone.getRooms();
This returns an array that we iterate through to assemble a list of qualifying rooms:

Code: Select all

for(i = 0; i < allRooms.length; i++){				
    if(allRooms[i].isGame() && 
       allRooms[i].getName().substr(0, 5) == "AUTO_"){	
	     rooms.push(allRooms[i])
    }
}   
Now that we have an array of qualifying rooms, we iterate through each room to determine the best match. When a good match is found, we attempt to join the player to the room:

Code: Select all

var ok = _server.joinRoom(user, 
                         currRoomId, 
                         true, 
                         rooms[i].getId(), 
                         "", 
                         false, 
                         true
                         );

if(!ok){
    trace("join failed!");
    trace(" -- username: " + user.getName());
    trace(" -- currRoomId: " + currRoomId);
    trace(" -- roomId: " + rooms[i].getId()); 
							
}
The problem occurs after our server is online for a few days and after thousands of rooms have been created and destroyed. It's at this point that we begin seeing the "Ghost Rooms". The above joinRoom will begin to fail, tracing loads of errors that look like the following:

Code: Select all

[ WARNING ] > JavaException: java.lang.NullPointerException: null
[main.as]: join failed!
[main.as]:  -- username: tester
[main.as]:  -- currRoomId: 8
[main.as]:  -- roomId: 137408

[ WARNING ] > JavaException: java.lang.NullPointerException: null
[main.as]: join failed!
[main.as]:  -- username: tester2
[main.as]:  -- currRoomId: 54
[main.as]:  -- roomId: 137408

[ WARNING ] > JavaException: java.lang.NullPointerException: null
[main.as]: join failed!
[main.as]:  -- username: tester3
[main.as]:  -- currRoomId: 32
[main.as]:  -- roomId: 137408
The most interesting part is the roomId that the system is trying to join (roomId: 137408) is a roomId that no longer exists. If I scroll through the rooms in the SmartFox Admin Tool a room with the id 137408 does NOT exist. In our traces we receive the above error hundreds of times from different users. The only thing consistent among the errors is that the roomId 137408 appears in every single error.

If we restart the server, the "Ghost Rooms" are seemingly cleared because the error will go away for a few days until another room gets stuck, at which point errors will begin firing repeatedly again.

It doesn't seem possible that the id could be incorrect because of something I've written. I simply call the room.getId() method on the server to determine the number each time. If the room didn't exist this would throw a null error... so clearly the room still exists somehow.

Here are our specs:
SmartFoxServer Pro 1.6.6
API 1.5.8

Any thoughts on what's going on here? Why are we seeing these left over "Ghost Rooms"?

Thanks for any help.

Posted: 13 Aug 2009, 08:44
by Lapo
Hi,
I think we could help with this problem. There have been occasional reports of similar cases but we were unable to gather enough information to reproduce it. Seems quite erratic and it probably depends on long lasting connections, or ghost users.

At the moment our team is not fully operational as we're taking a bit of vacation. We'll get in touch next week and see if we can send a diagnostic extension that help us better understand what's going on.

It would be great if you could simply drop us an email as reminder to the usual contact email.

Thanks

Posted: 13 Aug 2009, 13:32
by jjduran
Lapo,

Thanks for the reply. I'm almost certain it has to do with ghost users due to poor disconnects, etc. Somehow these users are hanging certain rooms and causing the above problems.

It doesn't sound like you're on "a bit of a vacation" if you were able to reply on the forums? ... It sounds like you're suffering from a bit of "work-a-holic" syndrome =)

Anyway, I will eagerly await your response, and I'll send you a reminder email soon. For now, I've built a new mechanism that automatically attempts to destroy these "ghost rooms" when the errors begin to fire. I haven't been able to test it out as the errors haven't started yet, but I should know within a day or so if my workaround was successful.

As always, we appreciate the help.

Posted: 13 Aug 2009, 15:59
by Lapo
It doesn't sound like you're on "a bit of a vacation" if you were able to reply on the forums? ... It sounds like you're suffering from a bit of "work-a-holic" syndrome =)
A yes, I know the syndrome... it's difficult to cure :)
But when I said "we" I was referring to 2 of my colleagues, who are out of office for the next two weeks :P

Posted: 24 Aug 2009, 10:01
by AlecMcE
This sounds extremely similar to a bug that we are experiencing also. If you find a solution, please make it public.

Regards, Alec

Posted: 24 Aug 2009, 10:29
by Lapo
In order to be able to properly address the problem we would need to be able to reproduce it. It seems that this issue can be generated under particular scenarios/conditions and we haven't yet found a way to recreate such scenario.

We have already investigated possible thread safety issues but nothing specific was found. Additionally we have specific load testing procedures that can generate and destroy thousands of Room in search for a possible room-management bug, which was not found.

At the moment we can suggest to run a timed thread or scheduled task that checks for empty rooms and destroys them.

Since we are in the process of releasing a new update we will devote some extra time to inspect from a different perspective. If nothing is found we are available for dedicated consulting to locate the problem in your specific application.

Hope it helps

Posted: 24 Aug 2009, 10:37
by AlecMcE
"In order to be able to properly address the problem we would need to be able to reproduce it."

You've been reading our emails to our client!

We have also been unable to recreate locally but have numerous user reports that we have narrowed down to a failure of the joinServer call when there is definitely space available in the target room. We were thinking along the lines of 'ghost users' before we came across this post.

Thanks for the response, we'll give your suggestions some thought.

Alec

Posted: 24 Aug 2009, 11:28
by Lapo
Sorry which emails?
This post was not started by you. I am not able to correlate your name with recent emails received on this subject. Can you be more specific?
If you prefer you can send us an email with the details

Posted: 24 Aug 2009, 11:33
by AlecMcE
Sorry Lapo, it was an attempt at a joke. Your reply to us was the same as our reply to our clients.