What happened was this: Every time someone loads a page, there's a PageView entry written to the database that has a small amount of information about the page being loaded. These are deleted shortly thereafter; they mostly exist for the purpose of gathering usage stats about traffic to the site.
However, each new PageView entry has a unique integer ID, incremented by 1 for each new entry. This is similar to how every new image uploaded to the site has a new integer ID, in that case visible in the URL.
The database stores integers in such a way that it imposes a maximum size on them. For the PageView ID field, this maximum is 2^32 / 2, or 2,147,483,647. This was reached this afternoon. As a result, every time the site tried to write a new PageView entry to the database, it failed, which in turn caused the entire loading of the page to fail, and resulted in a 500 internal server error being given to you.
I fixed the immediate problem by clearing the PageViews and resetting the ID counter back to 1.
The longer-term fix is to use a different type of integer for this whose limit is 2^64. This number is so large that if 10 billion people on the planet each loaded a page every second, it would take about 60,000 years to reach it.
Today GWM was down for about an hour.
What happened was this: Every time someone loads a page, there's a PageView entry written to the database that has a small amount of information about the page being loaded. These are deleted shortly thereafter; they mostly exist for the purpose of gathering usage stats about traffic to the site.
However, each new PageView entry has a unique integer ID, incremented by 1 for each new entry. This is similar to how every new image uploaded to the site has a new integer ID, in that case visible in the URL.
The database stores integers in such a way that it imposes a maximum size on them. For the PageView ID field, this maximum is
2^32 / 2
, or 2,147,483,647. This was reached this afternoon. As a result, every time the site tried to write a new PageView entry to the database, it failed, which in turn caused the entire loading of the page to fail, and resulted in a 500 internal server error being given to you.I fixed the immediate problem by clearing the PageViews and resetting the ID counter back to 1.
The longer-term fix is to use a different type of integer for this whose limit is 2^64. This number is so large that if 10 billion people on the planet each loaded a page every second, it would take about 60,000 years to reach it.