Duplicate items in search results on different pages

The Movie Database 支持

支持 → API 支持

Bene8493
STAFFMOD
发布于 2020 年 06 月 12 日 12:22下午

Hi,

using /search/multi I sometimes get duplicate results on consecutive pages. For example: Jon Baker, id: 1546757 is the last item in /search/multi?query=jon&page=7 and the first item /search/multi?query=jon&page=8. Is this a bug or should I always take this into account and filter out duplicates when using pagination?

16 回复（第 1 页，共 2 页）

• Jump to last post

下一页 • 末页

t🇧

ticao2 🇧🇷 pt-BR 的回复

STAFFMOD

于 2020 年 06 月 12 日 1:06下午

In order for someone to help you with API Request questions, it is critical that you post here the API Request you are using.
Remember to replace your Key with THE_KEY , or something like that.

I made these two API requests and this ID 1546757 was not on pages 7 or 8.

https://api.themoviedb.org/3/search/multi?api_key=THE_KEY&query=jon&page=7
https://api.themoviedb.org/3/search/multi?api_key=THE_KEY&query=jon&page=8

You'd better post the API Requests you made.

Bene8493 的回复

STAFFMOD

于 2020 年 06 月 12 日 2:11下午

I'm sorry. These were the requests:
https://api.themoviedb.org/3/search/multi?query=Jon&page=7
https://api.themoviedb.org/3/search/multi?query=Jon&page=8

I use the Authorization Header instead of the api_key query parameter. The problem is not specific to ID 1546757 it happens with many other people too. I just tested it again and there were no duplicates until page 11. Then again, the last item from page 10 was the first item on page 11: "Jon Ekstrand" ID: 1077404.

Tested again with query: "The". This time there was a duplicate Movie on page 3:
https://api.themoviedb.org/3/search/multi?query=The&page=2
https://api.themoviedb.org/3/search/multi?query=The&page=3

Again last item from page 2 is first item on page 3. There seem to be different duplicates each time I try so might not be easy to reproduce. Most of the time the first duplicate appears after page 5.

t🇧

ticao2 🇧🇷 pt-BR 的回复

STAFFMOD

于 2020 年 06 月 12 日 2:43下午

I tried to reproduce the error and failed.
I checked in 10 pages of a Requisition.
Perhaps if I checked in 20 pages the error would arise.
I believe that only Travis Bell can have an answer.
So let's leave your question open and wait for him to see it.

Bene8493 的回复

STAFFMOD

于 2020 年 06 月 15 日 6:42上午

Would be nice if Travis could take a look at it. I wrote a python script to reproduce it. So far I found at least one duplicate every time I run it. The results seem to change quite often, but there are always duplicates in the first 10 pages it seems. Let me know if you need anything else.

import urllib.request
import json

AUTH_TOKEN = "your v4 auth token"
headers = {'Authorization': 'Bearer ' + AUTH_TOKEN}

results = []

for i in range(0, 10):
    url = 'https://api.themoviedb.org/3/search/multi?query=The&page=' + \
        str(i + 1)
    req = urllib.request.Request(url, None, headers)
    jsonResponse = json.load(urllib.request.urlopen(req))
    ids = list(map(lambda x: x["media_type"] +
                   ":" + str(x["id"]), jsonResponse["results"]))
    results.append(ids)
    for j in range(0, i - 1):
        for x in ids:
            for y in results[j]:
                if x == y:
                    print("found duplicate")
                    print("id:", x)
                    print("pages:", j + 1, i + 1)

t🇧

ticao2 🇧🇷 pt-BR 的回复

STAFFMOD

于 2020 年 06 月 15 日 10:34上午

@Bene8493
I sent a warning to Travis Bell.
This problem is far beyond my ability.
Thank you.

Travis Bell 的回复

STAFFMOD

于 2020 年 06 月 17 日 11:27上午

Hi @Bene8493, I've created a ticket to track this here. Unfortunately I don't have any time to look at this in the near future but at least I have it tracking now.

Bene8493 的回复

STAFFMOD

于 2020 年 06 月 22 日 2:41下午

Thanks. FYI: also happens in v4 recommendations (/account/{account_id}/movie/recommendations). I got Movie id 393 on page 2 and 4.

bicelis 的回复

STAFFMOD

于 2024 年 06 月 23 日 8:32上午

Hello :) I'm still experiencing this issue in 2024. @travisbell, is there a fix perhaps in progress? :)

Travis Bell 的回复

STAFFMOD

于 2024 年 06 月 25 日 2:05下午

As items shift around due to either changing data, or things like updated popularity scores during the day, we make no guarantee that there won't be duplicate items across multiple pages. This is because a lot of data is cached.

You can track which ID's have been "seen" by your app and skip ones that have already been returned.

Zsolt Bertalan 的回复

STAFFMOD

于 2024 年 10 月 25 日 5:53下午

That's a bigger problem than you believe. It's not the clients job to fix a bug on the server. I understand it's due to caching, but the caching needs to be more coordinated. The problem is more reproducible on frequently changing pages like popular or now-playing. It's also not only duplicating. The problem is caused by swapping movies between pages, so for every duplicate there is a missing movie. And because the page caches are generated at different times, the frequently changing ones have duplicates and missing ones when you hit the cloud cache for the first time, for example when starting an app for the first time.

t🇧

ticao2 🇧🇷 pt-BR 的回复

STAFFMOD

于 2024 年 10 月 25 日 7:59下午

It may be better to clear the entire cache.
Generate a new list at a specific time of day.
Keep this new list in the cache for 24 hours.
And repeat this operation the next day.

Zsolt Bertalan 的回复

STAFFMOD

于 2024 年 10 月 26 日 11:09上午

And how will this fix the 'bug' on the server side? The problem is that the various pages represent different resources and they are cached on the server cache at different times, while they are related. Maybe a simplified example will help to understand it.

Imagine two pages with two items each. At midnight, Page 1 holds A and B initially, while Page 2 holds C and D. Page 1 is cached at uneven hours, Page 2 is cached at even hours. At 1 o'clock Page 1 is updated, and at this point B and C are swapped.

If I start an app at 1:30, Page 1 returns A and C, while Page 2, which has an old cache and will only update at 2 o'clock, will return C and D. So B is missing altogether, while C is duplicated. After 2 o'clock Page 2 will correctly return B, but by that time something else is swapped, like D and E between Page 2 and Page 3.

Currently I have popular, upcoming and now-playing movies in my app, which return 20 movies per page. When I open my app for the first time and scroll 10 pages each, instead of 200 each, I see 199, 190 and 193 movies, because there are 1, 10 and 7 swaps respectively, that are not fully cached yet at this point. AFAIK there is no way to get the missing movies, because the server doesn't respect cache-control directives, which is understandable. I'm not sure what's the exact logic of the Cloudfront cache, it would be interesting to know.

t🇧

ticao2 🇧🇷 pt-BR 的回复

STAFFMOD

于 2024 年 10 月 27 日 11:20上午

Deletes the entire cache.
The system generates a new list. 500 pages.
All 500 pages in this new list will be the pages sent for any search performed throughout the day.
The 500 pages will not be updated throughout the day.
The next day, a new list is generated with the score changes that occurred.

It would be something like "The most popular yesterday" or "Yesterday's Trending"

Zsolt Bertalan 的回复

STAFFMOD

于 2024 年 10 月 27 日 12:14下午

Yes, this is how it should work, but it's not. What is your point?

t🇧

ticao2 🇧🇷 pt-BR 的回复

STAFFMOD

于 2024 年 10 月 27 日 12:19下午

@zsolt.bertalan said:

Yes, this is how it should work, but it's not. What is your point?

There are several changes in votes or trending throughout the day.
These changes throughout the day are considered with each new API Request.
My point is, do not consider these changes that occur throughout the day.

关注此讨论的用户

分类

The Movie Database 支持

支持 → API 支持