one of the example calls i am making
https://api.themoviedb.org//3/discover/movie?api_key=&with_genres=99&language=en&page=2&vote_count.gte=5&sort_by=release_date.desc
Same thing for list calls
that doesn't happen on every call but almost 70% of the time the call fails
Fılmdi nemese kórsetimdi taba almadyńyz ba? Ony jasaý úshin kirińiz.
Want to rate or add this item to a list?
Not a member?
Travis Bell jaýaby
2014 Aqpan 8 kúni 11:08 PM ýaqytynda
Hey hosam,
We had one of our API servers fail and the remaining servers got overwhelmed with the resulting load.
We're looking into ways to make sure our system can handle a single point of failure, everything is duplicated already for redundancy and cases like this (2 database servers, 2 search servers, 2 cache servers, 2 web servers) but our 4 API servers were apparently unable to handle the load with one down. This will be fixed.
Cheers.
hosam jaýaby
2014 Aqpan 9 kúni 6:36 AM ýaqytynda
no worries !, Thanks a lot for your speedy reply as always :)
Travis Bell jaýaby
2014 Aqpan 9 kúni 10:21 AM ýaqytynda
If you were curious about what happened and the fix in a little more detail here's a more elaborate run down.
We use Amazon EC2 for just about everything and they have the concept of availability zones. These are discrete areas within the EC2 service where you can increase your availability by protecting yourself from different issues that can arise from within a zone itself. Anything from hardware failures to network failures--basically things that are out of our hands and problems that Amazon can have.
In our case we had 2 servers in one zone, and 2 others in another. Awesome! ... until one of those instances failed.
The way our load balancer splits traffic is pretty much 50/50 to each zone. And I had originally thought if one server failed, the remaining three would take over the load. Unfortunately that's not quite right. Our ops guy was quick to point that out. It's also why we pay him to do this and not me :D Only the server in the same availability zone would take over the dead servers traffic. So, in the one availability zone we doubled the load on the one remaining box and that much load was enough for it be saturated so it started dropping requests too. So now, we weren't just down one, we were down almost 2. The problem basically compounded itself resulting in a dead availability zone.
To fix this we're now running 8 servers, 4 in each zone. This will make each servers job less important in the whole stack which means if we go down one, we have three times the capacity to take over the resulting load (remember before it was one, now it will be three) per zone.
We have invested a lot to make sure we are able to deliver our service to everyone all the time. Sometimes though, it takes the odd failure to point out flaws in the design. The changes we made last night should help this situation from happening again.
Thanks!
hosam jaýaby
2014 Aqpan 9 kúni 10:32 AM ýaqytynda
thanks for the extensive explanation :). you guys have one of the best support on the planet thats why i am a die hard fan of the website and service.
keep up the great work!
Travis Bell jaýaby
2014 Aqpan 9 kúni 10:40 AM ýaqytynda
You're very welcome!