پشتیبانی پایگاه داده‌ فیلم

This is a bit of a loaded question, sorry in advance. I, along with (it seems) a lot of people, thought that the original_language field in the movie-details model was used to indicate the main language spoken during the movie. Apparently it's not meant for that, as it is meant to represent the production companies:

https://www.themoviedb.org/movie/240832-lucy/discuss/662e2bd9c56d2d0126cccf58
https://www.themoviedb.org/movie/467244-the-zone-of-interest/discuss/65d72da4c5c1ef017d8bf4ea
https://www.themoviedb.org/talk/65f7d443242f94017dce2645
https://www.themoviedb.org/talk/63d7ab07c15b550079fbafd5

I don't understand why we need a field on the movie-details model for a piece of information that could be gotten more accurately by iterating over the production companies or why such a field would not be better named "original_production_language" (or "production_country") as that would make it clear what it is to every one.

I'll repost what I already said in the lucy discussion:

@takeshi2010 said:

Also, if this field is truly meant to become a reflection of the origin of the production companies, which I guarantee you, 99% of the people contributing to this site didn't know, how are we supposed to know what's the main language of a movie ? Conventional wisdom would say "take the first value from spoken_languages", but here's the output I get, I'm sure you'll see the problem:

title      |       original_language        |     spoken languages    |    correct original language
The Man from U.N.C.L.E. (2015) | en | ['it', 'en', 'ru', 'de'] | en
Risky Business (1983) | en | ['de', 'en'] | en
John Wick: Chapter 4 (2023) | en | ['ar', 'cn', 'en', 'fr', 'de', 'ja', 'la', 'ru', 'es'] | en
Taxi (1998) | fr | ['pt', 'fr', 'de', 'ko'] | fr
Asterix & Obelix: Mission Cleopatra (2002) | fr | ['de', 'fr', 'ar', 'cn', 'la'] | fr
The Mask (1994) | en | ['sv', 'en'] | en
The Long Good Friday (1980) | en | ['fr', 'en'] |  en
GoodFellas (1990) | en | ['it', 'en'] |  en
The Green Mile (1999) | en | ['fr', 'en'] | en

I've just selected a few examples but I have close to 2000 titles that pose this problem from my curated list of 15000 movies. A lot are well known, popular movies. The spoken_language field is not reliable for finding out the original language of a movie, because I suspect a lot of people don't care about the ordering of this because they fill out the original_language field with the proper value. There's also a bunch of cases where people don't even bother with the spoken_language field because, again I suspect they think original_language is enough. Forcing a re-interpretation of the original_language field this late in the game means a bunch of data is wrong.

There are the odd cases where original_language is wrong, but this concerns more obscure, often non-english speaking movies like

title      |       original_language        |     spoken languages    |    correct original language
Taoism Drunkard (1984) | en | ['zh'] | cn

But these are sufficiently rare that correcting them by hand is doable.

Here's the problem: I don't think most people who contribute to the site got the meaning being that field. As a result, trying to enforce that rule now makes it impossible to get an accurate information either way. If I understand the field to mean "production language", then I have to fallback to "spoken_language" to get the info I need and that's just not accurate at all. If I understand it as it has been used by most of the community, I get incorrect data in the rare cases where mods take action (Lucy, Zone of Interest....).

So considering all this, how do I reliably get the main language of a movie ?

As an aside question, who cares more about what language the producers were speaking during the shoot than they do about the language of the script or the recorded language the actors were speaking? Production country, I get, but original production language...

6 پاسخ (در صفحه 1 از 1)

Jump to last post

I don't understand why we need a field on the movie-details model for a piece of information that could be gotten more accurately by iterating over the production companies.

In fact, this is one of the most important fields of each record, as it is mandatory, fairly well-maintained, and allows simple and quick filtering of records by language. In any case, it cannot be generated from production companies as you suggest, because production companies only contain information about the country of production, not the language (that wouldn't make any sense anyway).

By comparison, similar databases do not contain comparable fields at all - IMDb only has spoken languages (but I think it's much more efficient to use production country here, because spoken language doesn't work if you want to get a German silent film, for example) and Wikidata doesn't systematically maintain the language of the film (although it does allow you to enter it, of course). Filtering entries by language is almost impossible in these databases, unlike TMDb.

or why such a field would not be better named "original_production_language" (or "production_country") as that would make it clear what it is to every one.

Apart from the fact that it's really not a good idea to change the field names in a database actively used by thousands of projects, your suggestion doesn't make sense anyway, because our "original movie language" has about as much in common with "original production language" as with "spoken language" - it will always match those languages 99% of the time, and the other 1% will be various exceptions that someone will have a problem with.

The film industry is so diverse that any cataloguing system cannot be expected to fit it 100%. Our "original language" field is described in the rules here. I'm mainly dealing with Czech and Slovak film, which has quite a few mutual co-productions and co-productions with other countries, and after all the years I've been here, my experience is that our "original movie language" most accurately corresponds to the language of the onscreen credits of the original version of the movie. There are always exceptions, though, and each mod has its own method to this field that suits them best, because as I've already mentioned, no 100% applicable rule can simply be created. However, the differences in the mods' methodologies are completely negligible anyway, as they only affect a tiny fraction of entries where the "original movie language" isn't obvious at first glance, and even though you might not believe it, the mods usually discuss the problematic entries with others to reach a mutual consensus.

Here's the problem: I don't think most people who contribute to the site got the meaning being that field.

This is a general problem with most of the fields in our database. There will always be someone who doesn't read the rules or modifies the database to suit their personal needs and expectations, regardless of others.

As a result, trying to enforce that rule now makes it impossible to get an accurate information either way.

Our database is open to all and its 100% correctness can never be guaranteed. No other database will provide you with 100% correct data either, because it is either also based on the assumption that users will fill it in correctly (IMDb, Wikidata) or it is proprietary, in which case it is also often incomplete, as there is usually no one to keep it up to date and maintain it (except for some publicly funded national databases that have very good quality and up-to-date data, but are by definition only local and therefore not usable by most users).

If I understand it as it has been used by most of the community, I get incorrect data in the rare cases where mods take action (Lucy, Zone of Interest....).

I think you really very much underestimate the effort the mods put into maintaining this database.

So considering all this, how do I reliably get the main language of a movie ?

I don't know what you mean by "main language of the movie", but if it's supposed to be the main language spoken in the movie, you won't find that information with 100% accuracy, because (aside from the fact that we will always have a certain percentage of errors) our "spoken language" field doesn't allow sorting languages by the frequency of occurrence in the movie/TV show.

Furthermore, the "spoken language" field is often misunderstood by users who tend to include dubbed languages and it doesn't handle well content with no spoken language anyway, so your best chance of getting "something like the main language of the movie" is to use the "original movie language" man_shrugging_tone1 .

Thank you for the thoughtful reply. A couple things I think deserve a follow up:

  • I feel like maybe you thought I was criticizing the work of the mods here. That's absolutely not the case. I'm in disagreement with a couple of their decisions, but that's about it. I have and will always have respect for the amount of work that goes into this database.
  • Regarding the rules, they state The purpose of this field is to try and pair a language with the "original version of the film" then goes on to give examples of an English film, Avatar, a French film, Amelie Poulain, and so on. The rule, however doesn't say what criteria makes each film a <insert_language> film, except for the language of the title, and the language of release, which can sometimes supersede the language of the title (Bonjour Tristesse, Boy 7). What's unclear to me is what makes a movie an English release if a movie like Lucy is considered a French one. One could easily argue it has an English title, it was shot from an original script in English (https://assets.scriptslug.com/live/pdf/scripts/lucy-2014.pdf), with 2 US actors as the main stars (Scarlett Johansson and Morgan Freeman) and it was made for the American market first and foremost. Most importantly, If the movie were to be shown with it's French audio track, it would be a considered a dub by everyone. The rules make no mention of the language spoken in the country of origin of the production companies that made the movie, which is the reason the mod stated for labeling Lucy as French. Another way of saying the same thing: is French the "original version of" Lucy? How could anyone justifiably say yes?
  • I get your point about silent films, and I agree. Still I think using the language of the original script as a baseline, maybe the language of the original intertitles when that's possible, can give a good result. I also think it wouldn't be too far off from how this field has been used so far in the majority of cases. Again, I'm not complaining about the majority of cases, but about the exceptions, like Lucy, for which the manner in which this field is interpreted makes a big difference.

Regarding the rules, they state The purpose of this field is to try and pair a language with the "original version of the film" then goes on to give examples of an English film, Avatar, a French film, Amelie Poulain, and so on. The rule, however doesn't say what criteria makes each film a <insert_language> film, except for the language of the title, and the language of release, which can sometimes supersede the language of the title (Bonjour Tristesse, Boy 7)...

The criteria that makes each film a <insert_language> film are specified in the Contribution Bible and are as follows:

1/ "Original movie language" must be paired with "original version of the film".

2/ "Original version of the film" is the first "local" public and official non-festival release of the film.

3/ A "local" release is a release in one of the production countries.

4/ The purpose of "original movie language" is to create a "default translation" which is then linked to one of the supported TMDb translations (e.g. en-US, de-DE...).

5/ "Original film language" is not automatically the same as the language of the title of the film, as there can be, for example, a Polish film with the English title "Help" (a creator's reference to the Beatles), completely spoken in Polish and released only in Poland.

Spoken language/

Apart from the cases where "spoken language" corresponds to the title of the film and also to the expected "original film language", there are also cases where the film can be:

a/ without spoken dialogue (silent movies, often animated movies, but also AAA movies like Baraka),

b/ with dialogue spoken in a language not supported by the TMDb or even in a language not listed in ISO 639-1 at all (e.g. Romani),

c/ obviously spoken in a language other than the intended language of the film as delivered to its target audience – e.g. a Czech documentary about Mongolian culture that contains only Mongolian dialogue with no Czech spoken word (subtitled in Czech), but with Czech title and released only in the Czech Republic for the Czech audience).

Trying to use a "spoken language" as a "original movie language" in any of the above cases prevents one of the basic purposes of the "original movie language" – to serve as a "default translation" to be paired with one of the supported TMDB translations. These cases cover a minor, but still quite substantial, portion of the world's movie production. For this reason, strictly using "spoken language" as the "original language" is highly impractical.

Conclusion/

It must be stressed that although the "original movie language" is not automatically the same as the "spoken language" or the language of the movie title, in 99% of cases it naturally corresponds to them. Thus, the problem with the definition of the "original movie language" is only in the remaining 1% of cases where it differs from the "spoken language" or the language of the movie title for some reason.

In these edge cases, it is up to the user (and ultimately the mod) to determine the correct "original movie language" using the above principles. In these cases, I think most mods have their own key, which, however, overwhelmingly leads to the same result.

For example, I have personally observed a strong coincidence between the on-screen credits language and the "original movie language" as implied by the rules above. This is, in a way, an extension of the principle that "original movie language" should correspond to the language of the movie title, but it takes into account cases where: a/ the use of a foreign language title (e.g. English) is clearly on artistic purpose and is not intended to imply that the movie is made in that language, b/ the movie title is language neutral (e.g. it is a common name like "Adam"). However, this is only my personal guide, which I apply only to Czech and Slovak content and to which I make very rare exceptions when justified.

What's unclear to me is what makes a movie an English release if a movie like Lucy is considered a French one...

I don't feel at all qualified to comment on the French content or edit it in any way. But generally – the film Lucy is 100 % French production, so it's local release could be only in France. "Original version of the film" in this case is the first non-festival release in France, which happened on 2014-08-06. The "original film language" should therefore be based on this version of the film. I have not seen the film, but imo it is quite likely that this version was:

a/ shown with French spoken dialogue,
b/ released with the language neutral title "Lucy"
c/ shown with opening and closing credits in French.

The above assumptions need to be confirmed by someone who understands this field better, but if they are valid, then I understand that a mod set its "original movie language" to French.

Most importantly, If the movie were to be shown with it's French audio track, it would be a considered a dub by everyone.

Personally, I would not put much importance on whether and to what extent some of the dialogue in the film was dubbed. "Original movie language" isn't automatically based on the "spoken language" (for the practical reasons mentioned above) and it is also quite common that a movie is provided with postsyncs that may be spoken by a person other than the actor. But that's just my opinion, because dubbed movies are in Czech almost always accompanied with the original subtitled version, so I've never actually had to make a decision like this (i.e. the few Czech productions shot in English were always locally released either in English or both in English and Czech and I've always set their "original movie language" to English).

Another way of saying the same thing: is French the "original version of" Lucy? How could anyone justifiably say yes?

I don't know anything about the film, but for me personally, the fact that the original version of the film has a language neutral title and French credits (i.e. the director is listed as "Réalisateur") would be reason enough. man_shrugging_tone1

I get your point about silent films, and I agree. Still I think using the language of the original script as a baseline, maybe the language of the original intertitles when that's possible, can give a good result. I also think it wouldn't be too far off from how this field has been used so far in the majority of cases. Again, I'm not complaining about the majority of cases, but about the exceptions, like Lucy, for which the manner in which this field is interpreted makes a big difference.

How do you find out the language of the original script? Script is something that is not normally available, all we have is the final film and sometimes promo materials. And how is the language of the script better than the language of on-screen credits (if I may advocate my own system)? As for intertitles - not all movies have them, because intertitles were a thing of the silent movie era. And if you take the idea of intertitles to its logical conclusion, you will end up with on-screen credits anyway...

Overall, I can't help the impression that any system you propose is merely to justify why to primarily rely on the "spoken language", because there are edge cases where our "original movie language" doesn't meet your expectations. However, I don't think that this has ever been the goal of the "original movie language" field. This field has a given technical purpose in the database (hence the requirement for it to serve as a sort of "default translation", since the "original movie language" determines what TMDb translation should use the "original title" as a default and such a translation should have its "translated title" field locked blank). The fact that this field can also be used with very good effect to filter movies by their language is, after all, just a very nice side effect. Other databases do not contain such a field at all. IMDb allows you to filter based on the production country, other databases may allow you to filter based on spoken language, but they don't take into account all the annoying cases where spoken language cannot be used (see above).

So in the end I personally think, that our current system is the best compromise we can have. If anyone has a problem with the "original movie language" for any particular film, then they should properly justify their objection and challenge it within the report. The fact that some particular "original movie language" does not meet the expectations of most users (even if the mod made a mistake and that "original movie language" is actually incorrect) is not a reason to rework the entire system.

Again, thank you for the thoughtful reply. I think there's been a couple misunderstandings:

  • I was only talking about inter-titles for silent movies not every movie.
  • Scripts or original script pages tend to be easily available, if you look for them. Cinematheques have truckloads of them, and a lot of them can be found online (even if only partially). Finding the Lucy script took me 5s. It seems to me like a more robust system than relying on the language of the end credit at the time of release in a particular country, because this info is not easily available after the fact (couldn't find it for Lucy). Also, with DCP, the end-credit technique has limitations, as production companies can easily distribute translated credits for every territory and language track. So you could have a movie released in France in its original language with same language credits, and a second French release with a French dub and French credits. Ultimately, what is going to determine the original language of the movie is the language of the original audio track (for sound movies) and the language of the inter-titles (for silent movies).
  • I'm not talking about overhauling the system. As I said, I think the current system gets it right most of the time. What I am saying is that the current rule is open to interpretation. By your own admission, "in these cases, I think most mods have their own key". What I'm saying is that clarifying the rule to make it not open to interpretation gets us more robust data. One such way would be to fallback on the script language or the inter-titles (for silent movies), or push comes to shove, the credits, when the current rule doesn't give us a clear answer. Having each mod apply their own rule for these cases makes the data inconsistent and the justification rightfully dubious (there's a reason Lucy has multiple reports made for the original language field).
  • Lucy was released first in the USA (a week prior to its French release). I didn't get to see it in theaters at the time, but I can tell you every video release I've had so far has had English credits (including French releases). If I'm following your logic, English should be the original language. But the open interpretation of the current rule allows a mod to say "no it's French because the production companies spoke French". Also, I don't think any French person thinks of Lucy as a French movie, but that's beside the point. As for the language it was first released in, in France, this is problematic, because France always releases film in their original language with French subtitles. Popular releases like Lucy also get a French dubbed version. As for which one is more prevalent, it depends. In the countryside the dubbed versions get more play, so more screens, I think. In metropolitan areas (especially Paris), the original version are just as popular, if not more so than the dubs (I live in Paris and almost never check the language beforehand, because most of the time it will be original). So the release language of Lucy, in France, doesn't really help unless you care about which one was considered original (VO, the English track) and which one was considered the dub (VF, the French one).
  • I think the way Wikipedia describes the film is interesting: "Lucy is a 2014 English-language French science fiction action film". So a film where the original language is English, but that was produced by companies originating from France. To me, it shows quite well my problem with the original_language field. If we say Lucy is English, we're true to the meaning of the name of the field. If we say French, we introduce the notion of country of origin to a language field and, because it's a language field, we choose the language of the country of origin. This seems contrived at best. So yes, Either Lucy is English or that field's name is very ambiguous and the documentation doesn't clarify the ambiguity. Although I did question (in my first post) why it was named like this in the first place if the idea was to put a different concept in it, I'm obviously not saying we should rename the field and break the API. But then either the data needs to match the field name unambiguously or the documentation should be clearer.

If anyone has a problem with the "original movie language" for any particular film, then they should properly justify their objection and challenge it within the report.

I would agree with you if there wasn't already multiple reports open for these movies, all shot down with the same reason, while the reasons given by the users are not acknowledged. If people keep opening reports, it's probably because the mod-given reason doesn't seem justified. Some even say so. But that point is never acknowledged either. The same reason is given and treated as if it was the rule (despite the fact that it is not in the rule). It really feels like there is no discussion to be had in these reports. I wouldn't have opened this thread here, if I felt the discussion was open there.

I hope I've managed to get my point across. I mean no disrespect to the work the mods do. But I also know a rule that is open to interpretation leads to these weird corner cases. Clarifying the rule will help everyone in the long run. And if the clarification ends up being "whatever language was spoken in the country of origin of the main production companies", then, however arbitrary that seems to me, I'll shut up and eat my broccoli :)

This is getting a little out of proportion with its scope, so I'd like to put an end to it. I was responding to your original question mainly because it was generally about what fields from the TMDb API to use to filter content by its language (this is something I do daily when dealing with Czech and Slovak content, and I can help with) and why it's not a good idea to rework the current definition of original movie language. I'm able of talking Czech and Slovak content to death with you, but I certainly don't want to spend my time discussing an area I only understand superficially, which is French cinema. So I'm just making a few quick additional comments below.

@takeshi2010 said:

Again, thank you for the thoughtful reply. I think there's been a couple misunderstandings:

  • I was only talking about inter-titles for silent movies not every movie.
  • Scripts or original script pages tend to be easily available, if you look for them. Cinematheques have truckloads of them, and a lot of them can be found online (even if only partially). Finding the Lucy script took me 5s. It seems to me like a more robust system than relying on the language of the end credit at the time of release in a particular country, because this info is not easily available after the fact (couldn't find it for Lucy). Also, with DCP, the end-credit technique has limitations, as production companies can easily distribute translated credits for every territory and language track. So you could have a movie released in France in its original language with same language credits, and a second French release with a French dub and French credits. Ultimately, what is going to determine the original language of the movie is the language of the original audio track (for sound movies) and the language of the inter-titles (for silent movies).

Feature films in theatres where there is some possibility of finding a script make up (rough estimate) about 20-25% of the database. Of this, non-English content accounts for half. For Czech and Slovak content I can reliably say that out of about 10k items I am able to find the script for a few dozen, and that's only if I go to a stone library. Script language may be a supporting argument for determining "original movie languge", but it is certainly highly impractical to refer to it in general.

  • I'm not talking about overhauling the system. As I said, I think the current system gets it right most of the time. What I am saying is that the current rule is open to interpretation. By your own admission, "in these cases, I think most mods have their own key".

It will never be possible to establish a 100% applicable rule. It's just inevitable that users and mods have to use reason for a certain percentage of films.

What I'm saying is that clarifying the rule to make it not open to interpretation gets us more robust data. One such way would be to fallback on the script language or the inter-titles (for silent movies), or push comes to shove, the credits, when the current rule doesn't give us a clear answer. Having each mod apply their own rule for these cases makes the data inconsistent and the justification rightfully dubious (there's a reason Lucy has multiple reports made for the original language field).

  • Lucy was released first in the USA (a week prior to its French release).

I really can't talk about Lucy or any French film. I'm just pointing out that according to our data (list of production companies) it is a 100% French production and therefore the French local release should be used to determine "original movie language" and "original title".

FYI, parallel to our discussion, a question was raised on the mod forum, which clarified a few principles for setting "original movie language" and Lucy has it set back to English. This is based on the exception for films produced in English in a non-English speaking country, based on which the first non-festival release in an English speaking country is considered the "original release" (US release in the case of Lucy).

فیلم و نمایش تلویزیونی را نمی‌توانید پیدا کنید؟ به سیستم وارد شوید تا آن را ایجاد کنید.

Global

s تمرکز بر منوی جستجو
p منوی پروفایل باز شود
esc بستن پنجره باز
? پنجره میانبرهای صفحه‌کلید باز شود

در صفحات مدیا

b بازگشت به عقب (یا در صورت لزوم به منشا)
e برو به صفحه ویرایش

در صفحات فصل تلویزیونی

(فلش سمت راست) برو به فصل بعد
(پیکان سمت چپ) برو به نشست قبلی

در صفحات قسمت تلویزیونی

(فلش سمت راست) برو به قسمت بعد
(پیکان سمت چپ) برو به قسمت قبلی

در تمام صفحات تصویر

a پنجره افزودن تصویر باز شود

در تمام صفحات ویرایش

t انتخابگر ترجمه باز شود
ctrl+ s ثبت از

در صفحات بحث

n ایجاد بحث جدید
w تغییر وضعیت وضعیت تماشا
p تغییر وضعیت عمومی/خصوصی
c تغییر وضعیت بسته/باز
a گشایش صفحه فعالیت
r پاسخ به بحث
l برو به آخرین پاسخ
ctrl+ enter پیام خود را ثبت کنید
(فلش سمت راست) صفحه بعد
(پیکان سمت چپ) صفحه قبلی

تنظیمات

آیا می‌خواهید به این مورد امتیاز دهید یا به فهرست اضافه کنید؟

ورود