We have stumbled across an issue when using TMDb Discover to fetch IDs.
When a sort option is not defined and the default sorting is used, the amount of IDs that is returned varies wildly (hundreds of items different).
To demonstrate this we have set up a script which:
The python script is available here, simply replace APIKEY with a valid API Key
For this test, I set the runs to be 30 seconds apart with 10 total runs.
My first run's parameters are this, with no sorting defined:
url_params = {
"api_key": f"{apikey}",
"with_companies": "33",
"language": "en",
"page": 1
}
The log file can be found here: GitHub Gist - No Sorting
As can be seen in the log, the ID Count
fluctuates from 2614, to 2129, back up to 2614, down to 2129 etc.
The log also shows that there are IDs in subsequent runs that do not appear in the first run, and vice-versa.
My second run's parameters are identical, apart from adding sorting:
url_params = {
"api_key": f"{apikey}",
"with_companies": "33",
"language": "en",
"sort_by": "primary_release_date.asc",
"page": 1
}
The log file can be found here: GitHub Gist - Primary Release Date Sorting
As can be seen in the log, the ID Count
is consistently set to 2614 and the results never feature IDs that have not appeared before - it is consistently the same 2614 IDs.
找不到电影或节目?登录并创建它吧。
yozoraxcii 的回复
于 2024 年 09 月 21 日 2:08下午
A couple other things I have tried to no success:
"sort_by": "popularity.desc"
In both of these scernarios, the amount of unique IDs differed from run-to-run, meaning that the same ID is appearing on more than one page of the response only when the sort order is
popularity
(which I assume to be the default sort order if it is not specified).EDIT:
Interestingly,
popularity.asc
works with no issues, this seems to be limited topopularity.desc
ticao2 🇧🇷 pt-BR 的回复
于 2024 年 09 月 21 日 3:58下午
I'm not an expert.
If I understand your question correctly, then...
If you don't specify the Sort, the default &sort_by=popularity.desc will be used.
Popularity and Vote are values that change constantly throughout the day.
Read here:
https://developer.themoviedb.org/docs/popularity-and-trending
That's why it's normal for a movie that was on page 3 to move to page 2.
But, we have the problem of page CACHE.
By definition, the CACHE has a lifetime.
The CACHE is not cleared instantly or simultaneously with changes in Popularity, Vote or Trend values.
The CACHE is not cleared instantly, it is simultaneous with the changes in Popularity, Vote or Trend values.
That's why this "error" of the same movie being on page 2 and 3.
I don't know if this helps you.
EDIT
I believe this explanation also applies to the use of the Trending Request.
ticao2 🇧🇷 pt-BR 的回复
于 2024 年 09 月 21 日 4:05下午
Regarding the different behavior between "popularity.asc" and "popularity.desc".
I believe that the less popular ones are less seen and/or their values change throughout the day.
So if the answer starts with the less popular ones it will probably be a more stable answer, with no issues.
yozoraxcii 的回复
于 2024 年 09 月 21 日 4:08下午
The issue isn't just duplicate entries - it's also that items that are returned from run 1 are completely absent from run 2 happening 30 seconds later, and can then reappear in run 3 30 seconds after that.
Cache is probably a good avenue to explore further, this doesn't seem to be intended behaviour though - if you and I both run a script to fetch movies from a company we should get the same results.
ticao2 🇧🇷 pt-BR 的回复
于 2024 年 09 月 21 日 4:13下午
Well, as I said at the beginning, I'm not an expert.
Let's wait for administrator Travis Bell to stop by.
yozoraxcii 的回复
于 2024 年 09 月 21 日 4:19下午
Just to add - After running more tests, this doesn't appear to impact tv discover (from what I've seen) and also doesn't impact companies with less results (company ID 1632 has 24 pages and doesn't seem to have this issue, compared to ID 33 which has 130 pages and has this issue).
ticao2 🇧🇷 pt-BR 的回复
于 2024 年 09 月 21 日 4:24下午
In any case, I think it would be good to know which API Request is having problems.
Not your code that builds the request, but the Request that is sent to our server.
Probably something like:
yozoraxcii 的回复
于 2024 年 09 月 22 日 7:13上午
Sorry for delayed response - the queries being sent are visible in the logs
https://api.themoviedb.org:443 "GET /3/discover/movie?api_key=MYAPIKEY&with_companies=33&language=en&page=1
superboy97 的回复
于 2024 年 09 月 22 日 7:51上午
Correct languages are of the form en-US. With just en, the result is not guaranted.
yozoraxcii 的回复
于 2024 年 09 月 22 日 8:13上午
Good spot - I'll make that amendment and run some more tests.
yozoraxcii 的回复
于 2024 年 09 月 22 日 11:58上午
I have made this amendment and can confirm the issue still exists - items are only appearing in some runs.
URL is
https://api.themoviedb.org:443 "GET /3/discover/movie?api_key=MYAPIKEY&with_companies=33&language=en-US&page=1
Log file is available here: TMDb Log with language set to
en-US
I'll also highlight that a lot of these runs have approximately 500 duplicate IDs (approximately 20% of the result set).
Here is a
popularity.asc
log which does not have any of these issuesAnd here is one where I've specifically set
popularity.desc
rather than relying on a "fallback" sort