Flood attack on websocket
Flood attack on websocket
I'm facing flood attack on my server. The attacker simple connect to websocket continously then server become lag and other cannot connect or wait long time. Many threads named I/O dispatcher have been created. And it takes a long time for them to be released. How can i fix it?
- Attachments
-
- Capture.PNG
- (150.7 KiB) Not downloaded yet
-
- photo_2024-08-09_00-32-36.jpg
- (142.56 KiB) Not downloaded yet
Re: Flood attack on websocket
Hi,
from the admin screenshot there seem to be ~100 users... are you sure it's an attack? 100 users shouldn't bother your server.
I would be more concerned about the 6K NPCs that I see in the admin screenshot, and the relative performance cost of handling their logic.
In any case if you're sure there's an ongoing attack using your server's firewall is usually the best approach, especially if the connections are coming from a specific family of addresses.
Cheers
from the admin screenshot there seem to be ~100 users... are you sure it's an attack? 100 users shouldn't bother your server.
I would be more concerned about the 6K NPCs that I see in the admin screenshot, and the relative performance cost of handling their logic.
In any case if you're sure there's an ongoing attack using your server's firewall is usually the best approach, especially if the connections are coming from a specific family of addresses.
Cheers
Re: Flood attack on websocket
We also get an issue with Tomcat - seemingly at random (we've not been able to figure it out yet) where it will just use up huge amounts of CPU and the server lags out. Reboot fixes it.
I've passed stack traces and such into analysis tools and they all point to the nio threads being stuck waiting. It's definitely related to usage, though, as the sister test server never experiences the same issue, but only has tens of users instead of thousands.
Some threads on StackOverflow/Exchange suggest it's a problem with the SLL support provided by the Java Secure Socket Extensions module and that it's primarily fixed by switching over to native OpenSSL. This involves rebuilding the tomcat modules with native support though and I'm not sure if that would be supported in SFS terms.
I've passed stack traces and such into analysis tools and they all point to the nio threads being stuck waiting. It's definitely related to usage, though, as the sister test server never experiences the same issue, but only has tens of users instead of thousands.
Some threads on StackOverflow/Exchange suggest it's a problem with the SLL support provided by the Java Secure Socket Extensions module and that it's primarily fixed by switching over to native OpenSSL. This involves rebuilding the tomcat modules with native support though and I'm not sure if that would be supported in SFS terms.
Re: Flood attack on websocket
@Void*
Thanks, we have never seen this in action, even when testing with thousands of websocket connections.
Still, It is possible that is due to SSL as our stress tests were primarily run without encryption.
We stay away from native implementations since SFS2X is intended to be multi-platform out of the box.
I know that Tomcat has an optional native component which, fortunately, does not require rebuilding the whole server but it adds complexity to the setup.
If you want to give it a try --> https://tomcat.apache.org/native-doc/
They have a binary distribution for Windows but not for Linux/MacOS. For those one must get the sources and build them for their architecture.
Cheers
Thanks, we have never seen this in action, even when testing with thousands of websocket connections.
Still, It is possible that is due to SSL as our stress tests were primarily run without encryption.
This involves rebuilding the tomcat modules with native support though and I'm not sure if that would be supported in SFS terms.
We stay away from native implementations since SFS2X is intended to be multi-platform out of the box.
I know that Tomcat has an optional native component which, fortunately, does not require rebuilding the whole server but it adds complexity to the setup.
If you want to give it a try --> https://tomcat.apache.org/native-doc/
They have a binary distribution for Windows but not for Linux/MacOS. For those one must get the sources and build them for their architecture.
Cheers
Re: Flood attack on websocket
We don't believe it to be an SFS problem itself, but I've googled the symptoms and it comes up a lot with Tomcat and the common theme is SSL.
To give you an idea of our scenario, we have approximately 1500 sessions at any one time split between WebSocket and TCP, all running encrypted, and approximately 10,000-20,000 unique users per day.
There is no pattern to the issue. Sometimes the server can run for 10-12 days continuously without the problem ocurring, other times it might happen an hour after a reboot.
The only thing in common is that the nio-jsse threads start getting "stuck" waiting on "something" (we can't see what) and we see our extension requests burst queue into the thousands for say half a second, then the queue empties itself. We also see that the outbound message queue increases as well, sometimes reaching 200-300 messages before clearing - but again, within a second or so.
Unfortunately our game does require a fairly rapid response from the server, so a second of lag is quite noticable.
Given that we don't think this is an SFS issue, we took to Stack Overflow/Exchange to try and find a solution and nearly all fingers point to JSSE as the problem.
As our system runs on Linux, we would need to build the native module and drop it in, like you say. We have, as of yet, been unable to complete the compilation process due to various dependency problems (as our server runs in Docker and apaprently this causes issues building the module). However, this is just a personnel resource problem of not having had time to figure that out yet.
Given that it rarely affects us, we haven't mde it a priority - once we get it running, if it fixes the problem I'll let you know.
For extended info, we have tried switching to nio2 which has improved performance considerably, however, the problem did not disappear.
To give you an idea of our scenario, we have approximately 1500 sessions at any one time split between WebSocket and TCP, all running encrypted, and approximately 10,000-20,000 unique users per day.
There is no pattern to the issue. Sometimes the server can run for 10-12 days continuously without the problem ocurring, other times it might happen an hour after a reboot.
The only thing in common is that the nio-jsse threads start getting "stuck" waiting on "something" (we can't see what) and we see our extension requests burst queue into the thousands for say half a second, then the queue empties itself. We also see that the outbound message queue increases as well, sometimes reaching 200-300 messages before clearing - but again, within a second or so.
Unfortunately our game does require a fairly rapid response from the server, so a second of lag is quite noticable.
Given that we don't think this is an SFS issue, we took to Stack Overflow/Exchange to try and find a solution and nearly all fingers point to JSSE as the problem.
As our system runs on Linux, we would need to build the native module and drop it in, like you say. We have, as of yet, been unable to complete the compilation process due to various dependency problems (as our server runs in Docker and apaprently this causes issues building the module). However, this is just a personnel resource problem of not having had time to figure that out yet.
Given that it rarely affects us, we haven't mde it a priority - once we get it running, if it fixes the problem I'll let you know.
For extended info, we have tried switching to nio2 which has improved performance considerably, however, the problem did not disappear.
Re: Flood attack on websocket
Lapo wrote:Hi,
from the admin screenshot there seem to be ~100 users... are you sure it's an attack? 100 users shouldn't bother your server.
I would be more concerned about the 6K NPCs that I see in the admin screenshot, and the relative performance cost of handling their logic.
In any case if you're sure there's an ongoing attack using your server's firewall is usually the best approach, especially if the connections are coming from a specific family of addresses.
Cheers
Hi, 6K isn't problem, attacker connect to websocket then disconect, then change IP and repeat again again. They used many device to do it, server become lagging and cannot connect to websocket normal.
I tried block with Cloudflare, it worked but normal user blocked too.
Re: Flood attack on websocket
@Stewie
I wouldn't really call that an attack. You can see in the log that 15 sessions were created within 4 seconds, or at a speed of roughly 3.75 connections per second.
If that's sufficient for your server to fall over, you may want to improve your hardware, or look at other places in the code that might be causing the issue.
If we've restarted our server we'll be processing roughly 50-75 legitimate connections per second for around 30 seconds and it doesn't affect the stability at all.
I feel like you might be saturating your maximum connection limit of the server HTTP/S connections. It's worth checking the number of max connections and max threads for the HTTP/S connectors.
I wouldn't really call that an attack. You can see in the log that 15 sessions were created within 4 seconds, or at a speed of roughly 3.75 connections per second.
If that's sufficient for your server to fall over, you may want to improve your hardware, or look at other places in the code that might be causing the issue.
If we've restarted our server we'll be processing roughly 50-75 legitimate connections per second for around 30 seconds and it doesn't affect the stability at all.
I feel like you might be saturating your maximum connection limit of the server HTTP/S connections. It's worth checking the number of max connections and max threads for the HTTP/S connectors.
Re: Flood attack on websocket
@Void
I checked on cloudflare at the time of the attack and there was a sudden increase in traffic. After the attack passed, our server operated normally again. Threads are automatically deleted to their original number, and users can access them more smoothly. I'm sure this is an attack.
I checked on cloudflare at the time of the attack and there was a sudden increase in traffic. After the attack passed, our server operated normally again. Threads are automatically deleted to their original number, and users can access them more smoothly. I'm sure this is an attack.
Re: Flood attack on websocket
But the attack isn't very strong is what I'm saying. It shouldn't be sufficient to lag out your server. However, what *would* lag it out, is if you reach the maximum limit of connections. Then your users trying to log in would experience a delay trying to get in whilst connection slots become free.
So either your screenshot of the log isn't representative of your attack (in which case, fair enough), or something else that's part of your login/connection process is causing the server to hang, or you need to increase the number of maximum connections/threads on your connector. 3.75 cps really isn't much. I'm pretty sure my home PC router gets attacked with more frequency than that.
As a quick rule of thumb, DoS flood protection generally doesn't kick in until you reach a couple thousand packets per second which is probably why Cloudflare isn't doing anything unless you block it outright.
Edit: Is max connections set like this for you, or is it something like 100?
So either your screenshot of the log isn't representative of your attack (in which case, fair enough), or something else that's part of your login/connection process is causing the server to hang, or you need to increase the number of maximum connections/threads on your connector. 3.75 cps really isn't much. I'm pretty sure my home PC router gets attacked with more frequency than that.
As a quick rule of thumb, DoS flood protection generally doesn't kick in until you reach a couple thousand packets per second which is probably why Cloudflare isn't doing anything unless you block it outright.
Edit: Is max connections set like this for you, or is it something like 100?
Re: Flood attack on websocket
Void* wrote:But the attack isn't very strong is what I'm saying. It shouldn't be sufficient to lag out your server. However, what *would* lag it out, is if you reach the maximum limit of connections. Then your users trying to log in would experience a delay trying to get in whilst connection slots become free.
The max number of connections is usually very high on a production server. And it's easy to crank it up via ulimit or similar settings (e.g. to 50K or more). If you're talking about maxing out the CCU of the license, only legit logged in Users count, not just connections.
Just wanted to clarify.
Cheers
Re: Flood attack on websocket
I also agree with Void* as regards the connections per second. At less than 4cps it's very hard to call that a DDoS attack, at least if the screenshots are representative of what's happening.
When we run our stress tests we usually connect 100 clients per second and push the tests in the 10s of 1000s CCU, even on relatively small server instances, such as a 4 core Linode or AWS machine. Even at that rate it's not common to see issues that can cause server failures or make it unreachable.
Cheers
When we run our stress tests we usually connect 100 clients per second and push the tests in the 10s of 1000s CCU, even on relatively small server instances, such as a 4 core Linode or AWS machine. Even at that rate it's not common to see issues that can cause server failures or make it unreachable.
Cheers
Re: Flood attack on websocket
Lapo wrote:The max number of connections is usually very high on a production server. And it's easy to crank it up via ulimit or similar settings (e.g. to 50K or more). If you're talking about maxing out the CCU of the license, only legit logged in Users count, not just connections.
Just wanted to clarify.
Cheers
I was meaning the max connections in the HTTPS Web Server configuration, not the CCU. Though I think he's close to hitting the CCU on NPCs alone... Anyhow, if he has that set to 100 or 150 then the webserver will max out even if they aren't logged on users. I believe one of the old 2.18 config files had this set to 150 or something. I remember our team needing to change it manually at some point because the file wasn't writable by the server tool.
Re: Flood attack on websocket
Hi,
I have found the problem. When a user logs in, a http request will be called. When a large number of users connect, it will cause congestion, and this is the reason it blocks the current flow, making it difficult for other users to connect.
So is there any way to handle this problem? We don't care about the response to this http request. Can it be separated into a separate thread?
I have found the problem. When a user logs in, a http request will be called. When a large number of users connect, it will cause congestion, and this is the reason it blocks the current flow, making it difficult for other users to connect.
So is there any way to handle this problem? We don't care about the response to this http request. Can it be separated into a separate thread?
Re: Flood attack on websocket
Sure, an http call inside the login handler is going to slow down everything. No wonder you could not scale effectively.
Yes, that seems like a good idea.
Instead of using a separate thread per http call I would rather put those calls into a queue and let it be processed by a small thread pool.
This would avoid using too many threads, which can impact performance at some point.
This takes a bit of evaluation on your side by investigating how long it takes (on average) to process one http request. Since every thread is blocked until the call completes, you can do the math and find out a reasonable amount of threads to use.
Another possibility would be to look into an async http library that could deal with those calls using a few threads.
Cheers
We don't care about the response to this http request. Can it be separated into a separate thread?
Yes, that seems like a good idea.
Instead of using a separate thread per http call I would rather put those calls into a queue and let it be processed by a small thread pool.
This would avoid using too many threads, which can impact performance at some point.
This takes a bit of evaluation on your side by investigating how long it takes (on average) to process one http request. Since every thread is blocked until the call completes, you can do the math and find out a reasonable amount of threads to use.
Another possibility would be to look into an async http library that could deal with those calls using a few threads.
Cheers