We have stumbled across an issue when using TMDb Discover to fetch IDs.
When a sort option is not defined and the default sorting is used, the amount of IDs that is returned varies wildly (hundreds of items different).
To demonstrate this we have set up a script which:
The python script is available here, simply replace APIKEY with a valid API Key
For this test, I set the runs to be 30 seconds apart with 10 total runs.
My first run's parameters are this, with no sorting defined:
url_params = {
"api_key": f"{apikey}",
"with_companies": "33",
"language": "en",
"page": 1
}
The log file can be found here: GitHub Gist - No Sorting
As can be seen in the log, the ID Count
fluctuates from 2614, to 2129, back up to 2614, down to 2129 etc.
The log also shows that there are IDs in subsequent runs that do not appear in the first run, and vice-versa.
My second run's parameters are identical, apart from adding sorting:
url_params = {
"api_key": f"{apikey}",
"with_companies": "33",
"language": "en",
"sort_by": "primary_release_date.asc",
"page": 1
}
The log file can be found here: GitHub Gist - Primary Release Date Sorting
As can be seen in the log, the ID Count
is consistently set to 2614 and the results never feature IDs that have not appeared before - it is consistently the same 2614 IDs.
Can't find a movie or TV show? Login to create it.
Want to rate or add this item to a list?
Not a member?
Reply by yozoraxcii
on September 21, 2024 at 2:08 PM
A couple other things I have tried to no success:
"sort_by": "popularity.desc"
In both of these scernarios, the amount of unique IDs differed from run-to-run, meaning that the same ID is appearing on more than one page of the response only when the sort order is
popularity
(which I assume to be the default sort order if it is not specified).EDIT:
Interestingly,
popularity.asc
works with no issues, this seems to be limited topopularity.desc
Reply by ticao2 π§π· pt-BR
on September 21, 2024 at 3:58 PM
I'm not an expert.
If I understand your question correctly, then...
If you don't specify the Sort, the default &sort_by=popularity.desc will be used.
Popularity and Vote are values ββthat change constantly throughout the day.
Read here:
https://developer.themoviedb.org/docs/popularity-and-trending
That's why it's normal for a movie that was on page 3 to move to page 2.
But, we have the problem of page CACHE.
By definition, the CACHE has a lifetime.
The CACHE is not cleared instantly or simultaneously with changes in Popularity, Vote or Trend values.
The CACHE is not cleared instantly, it is simultaneous with the changes in Popularity, Vote or Trend values.
That's why this "error" of the same movie being on page 2 and 3.
I don't know if this helps you.
EDIT
I believe this explanation also applies to the use of the Trending Request.
Reply by ticao2 π§π· pt-BR
on September 21, 2024 at 4:05 PM
Regarding the different behavior between "popularity.asc" and "popularity.desc".
I believe that the less popular ones are less seen and/or their values ββchange throughout the day.
So if the answer starts with the less popular ones it will probably be a more stable answer, with no issues.
Reply by yozoraxcii
on September 21, 2024 at 4:08 PM
The issue isn't just duplicate entries - it's also that items that are returned from run 1 are completely absent from run 2 happening 30 seconds later, and can then reappear in run 3 30 seconds after that.
Cache is probably a good avenue to explore further, this doesn't seem to be intended behaviour though - if you and I both run a script to fetch movies from a company we should get the same results.
Reply by ticao2 π§π· pt-BR
on September 21, 2024 at 4:13 PM
Well, as I said at the beginning, I'm not an expert.
Let's wait for administrator Travis Bell to stop by.
Reply by yozoraxcii
on September 21, 2024 at 4:19 PM
Just to add - After running more tests, this doesn't appear to impact tv discover (from what I've seen) and also doesn't impact companies with less results (company ID 1632 has 24 pages and doesn't seem to have this issue, compared to ID 33 which has 130 pages and has this issue).
Reply by ticao2 π§π· pt-BR
on September 21, 2024 at 4:24 PM
In any case, I think it would be good to know which API Request is having problems.
Not your code that builds the request, but the Request that is sent to our server.
Probably something like:
Reply by yozoraxcii
on September 22, 2024 at 7:13 AM
Sorry for delayed response - the queries being sent are visible in the logs
https://api.themoviedb.org:443 "GET /3/discover/movie?api_key=MYAPIKEY&with_companies=33&language=en&page=1
Reply by superboy97
on September 22, 2024 at 7:51 AM
Correct languages are of the form en-US. With just en, the result is not guaranted.
Reply by yozoraxcii
on September 22, 2024 at 8:13 AM
Good spot - I'll make that amendment and run some more tests.
Reply by yozoraxcii
on September 22, 2024 at 11:58 AM
I have made this amendment and can confirm the issue still exists - items are only appearing in some runs.
URL is
https://api.themoviedb.org:443 "GET /3/discover/movie?api_key=MYAPIKEY&with_companies=33&language=en-US&page=1
Log file is available here: TMDb Log with language set to
en-US
I'll also highlight that a lot of these runs have approximately 500 duplicate IDs (approximately 20% of the result set).
Here is a
popularity.asc
log which does not have any of these issuesAnd here is one where I've specifically set
popularity.desc
rather than relying on a "fallback" sort