Some of our customers have had DDoS attacks increase recently. This has increased the number of connections from unique IPs. We track the connecting IPs in memory (RAM) to maintain a loose session and limit the same IP from connecting excessively. The attacks filled up the RAM buffer that stores this list of IPs beyond the limits we'd previously anticipated. Over night a couple servers filled the buffer and would not accept connections from new IPs. We hadn't anticipated this issue so monitors were not in place on this particular buffer. Denied connections were monitored but reported as if they were attack traffic, which we get all the time so it didn't look unusual.

There are three things we're doing to fix this going forward:

  1. We've immediately increased the RAM allocation to this buffer on all servers by 4 times the previous size. This is in place now.
  2. We've added monitoring to this buffer to give an early warning when it gets close to capacity. This will be in place by the end of the day.
  3. We are working on a change to the underlying architecture that defaults to allowing connections if the IP memory buffer is unavailable as opposed to denying those connections. This will take about a week to code, test, and deploy.

In addition to the above, over the course of the next 30 days we are rolling out a significant expansion of the CloudFlare network. This will expand us far beyond our current 5 data centers, completely update our server hardware (including more RAM) to take into account everything we've learned over the last year of operating the network, and dramatically expand our resources in order to keep up with the growth in demand for our services. The first new data center in this expansion (Los Angeles) is scheduled to begin coming online the end of next week.

Tuesday, March 29, 2011

« Back