A bug causes Cloudflare to lose customer logs

However, what was designed as a failsafe to solve such a problem backfired and bit them. When the Logfwdr configuration is not available, failsafe will send logs to all clients. In this case, that five-minute problem caused a huge spike in the number of logs to be sent, overloading the buffering system, Buftee, and making it unresponsive.

Buftee provides buffers for each Logpush transaction, containing 100% of the logs generated by the location or account specified by that transaction, so failure to process one client’s transaction will not affect the progress of others. It contains safeguards to prevent you from being overwhelmed by a large increase in the number of buffers – but those safeguards have not been fixed, Cloudflare said.

“A temporary malfunction that lasted five minutes caused massive traffic that took us several hours to repair and recover from,” the blog said. “Because our backstops were not configured properly, the subsystems became so overloaded that we could not communicate with them normally. A complete reset and restart was required.”


Source link