This is a bit of a loaded question, sorry in advance. I, along with (it seems) a lot of people, thought that the original_language field in the movie-details model was used to indicate the main language spoken during the movie. Apparently it's not meant for that, as it is meant to represent the production companies:
https://www.themoviedb.org/movie/240832-lucy/discuss/662e2bd9c56d2d0126cccf58
https://www.themoviedb.org/movie/467244-the-zone-of-interest/discuss/65d72da4c5c1ef017d8bf4ea
https://www.themoviedb.org/talk/65f7d443242f94017dce2645
https://www.themoviedb.org/talk/63d7ab07c15b550079fbafd5
I don't understand why we need a field on the movie-details model for a piece of information that could be gotten more accurately by iterating over the production companies or why such a field would not be better named "original_production_language" (or "production_country") as that would make it clear what it is to every one.
I'll repost what I already said in the lucy discussion:
@takeshi2010 said:
Also, if this field is truly meant to become a reflection of the origin of the production companies, which I guarantee you, 99% of the people contributing to this site didn't know, how are we supposed to know what's the main language of a movie ? Conventional wisdom would say "take the first value from spoken_languages", but here's the output I get, I'm sure you'll see the problem:
title | original_language | spoken languages | correct original language The Man from U.N.C.L.E. (2015) | en | ['it', 'en', 'ru', 'de'] | en Risky Business (1983) | en | ['de', 'en'] | en John Wick: Chapter 4 (2023) | en | ['ar', 'cn', 'en', 'fr', 'de', 'ja', 'la', 'ru', 'es'] | en Taxi (1998) | fr | ['pt', 'fr', 'de', 'ko'] | fr Asterix & Obelix: Mission Cleopatra (2002) | fr | ['de', 'fr', 'ar', 'cn', 'la'] | fr The Mask (1994) | en | ['sv', 'en'] | en The Long Good Friday (1980) | en | ['fr', 'en'] | en GoodFellas (1990) | en | ['it', 'en'] | en The Green Mile (1999) | en | ['fr', 'en'] | en
I've just selected a few examples but I have close to 2000 titles that pose this problem from my curated list of 15000 movies. A lot are well known, popular movies. The spoken_language field is not reliable for finding out the original language of a movie, because I suspect a lot of people don't care about the ordering of this because they fill out the original_language field with the proper value. There's also a bunch of cases where people don't even bother with the spoken_language field because, again I suspect they think original_language is enough. Forcing a re-interpretation of the original_language field this late in the game means a bunch of data is wrong.
There are the odd cases where original_language is wrong, but this concerns more obscure, often non-english speaking movies like
title | original_language | spoken languages | correct original language Taoism Drunkard (1984) | en | ['zh'] | cn
But these are sufficiently rare that correcting them by hand is doable.
Here's the problem: I don't think most people who contribute to the site got the meaning being that field. As a result, trying to enforce that rule now makes it impossible to get an accurate information either way. If I understand the field to mean "production language", then I have to fallback to "spoken_language" to get the info I need and that's just not accurate at all. If I understand it as it has been used by most of the community, I get incorrect data in the rare cases where mods take action (Lucy, Zone of Interest....).
So considering all this, how do I reliably get the main language of a movie ?
As an aside question, who cares more about what language the producers were speaking during the shoot than they do about the language of the script or the recorded language the actors were speaking? Production country, I get, but original production language...
Film of tv-serie niet gevonden? Meld je aan om deze toe te voegen.
Want to rate or add this item to a list?
Not a member?
Reactie van talestalker
op 26 juli 2024 om 9:19 PM
In fact, this is one of the most important fields of each record, as it is mandatory, fairly well-maintained, and allows simple and quick filtering of records by language. In any case, it cannot be generated from production companies as you suggest, because production companies only contain information about the country of production, not the language (that wouldn't make any sense anyway).
By comparison, similar databases do not contain comparable fields at all - IMDb only has spoken languages (but I think it's much more efficient to use production country here, because spoken language doesn't work if you want to get a German silent film, for example) and Wikidata doesn't systematically maintain the language of the film (although it does allow you to enter it, of course). Filtering entries by language is almost impossible in these databases, unlike TMDb.
Apart from the fact that it's really not a good idea to change the field names in a database actively used by thousands of projects, your suggestion doesn't make sense anyway, because our "original movie language" has about as much in common with "original production language" as with "spoken language" - it will always match those languages 99% of the time, and the other 1% will be various exceptions that someone will have a problem with.
The film industry is so diverse that any cataloguing system cannot be expected to fit it 100%. Our "original language" field is described in the rules here. I'm mainly dealing with Czech and Slovak film, which has quite a few mutual co-productions and co-productions with other countries, and after all the years I've been here, my experience is that our "original movie language" most accurately corresponds to the language of the onscreen credits of the original version of the movie. There are always exceptions, though, and each mod has its own method to this field that suits them best, because as I've already mentioned, no 100% applicable rule can simply be created. However, the differences in the mods' methodologies are completely negligible anyway, as they only affect a tiny fraction of entries where the "original movie language" isn't obvious at first glance, and even though you might not believe it, the mods usually discuss the problematic entries with others to reach a mutual consensus.
This is a general problem with most of the fields in our database. There will always be someone who doesn't read the rules or modifies the database to suit their personal needs and expectations, regardless of others.
Our database is open to all and its 100% correctness can never be guaranteed. No other database will provide you with 100% correct data either, because it is either also based on the assumption that users will fill it in correctly (IMDb, Wikidata) or it is proprietary, in which case it is also often incomplete, as there is usually no one to keep it up to date and maintain it (except for some publicly funded national databases that have very good quality and up-to-date data, but are by definition only local and therefore not usable by most users).
I think you really very much underestimate the effort the mods put into maintaining this database.
I don't know what you mean by "main language of the movie", but if it's supposed to be the main language spoken in the movie, you won't find that information with 100% accuracy, because (aside from the fact that we will always have a certain percentage of errors) our "spoken language" field doesn't allow sorting languages by the frequency of occurrence in the movie/TV show.
Furthermore, the "spoken language" field is often misunderstood by users who tend to include dubbed languages and it doesn't handle well content with no spoken language anyway, so your best chance of getting "something like the main language of the movie" is to use the "original movie language"
.
Reactie van takeshi2010
op 27 juli 2024 om 10:26 AM
Thank you for the thoughtful reply. A couple things I think deserve a follow up:
The purpose of this field is to try and pair a language with the "original version of the film"
then goes on to give examples of an English film, Avatar, a French film, Amelie Poulain, and so on. The rule, however doesn't say what criteria makes each film a<insert_language>
film, except for the language of the title, and the language of release, which can sometimes supersede the language of the title (Bonjour Tristesse, Boy 7). What's unclear to me is what makes a movie an English release if a movie like Lucy is considered a French one. One could easily argue it has an English title, it was shot from an original script in English (https://assets.scriptslug.com/live/pdf/scripts/lucy-2014.pdf), with 2 US actors as the main stars (Scarlett Johansson and Morgan Freeman) and it was made for the American market first and foremost. Most importantly, If the movie were to be shown with it's French audio track, it would be a considered a dub by everyone. The rules make no mention of the language spoken in the country of origin of the production companies that made the movie, which is the reason the mod stated for labeling Lucy as French. Another way of saying the same thing: is French the "original version of" Lucy? How could anyone justifiably say yes?Reactie van talestalker
op 27 juli 2024 om 2:14 PM
The criteria that makes each film a
<insert_language>
film are specified in the Contribution Bible and are as follows:1/ "Original movie language" must be paired with "original version of the film".
2/ "Original version of the film" is the first "local" public and official non-festival release of the film.
3/ A "local" release is a release in one of the production countries.
4/ The purpose of "original movie language" is to create a "default translation" which is then linked to one of the supported TMDb translations (e.g. en-US, de-DE...).
5/ "Original film language" is not automatically the same as the language of the title of the film, as there can be, for example, a Polish film with the English title "Help" (a creator's reference to the Beatles), completely spoken in Polish and released only in Poland.
Spoken language/
Apart from the cases where "spoken language" corresponds to the title of the film and also to the expected "original film language", there are also cases where the film can be:
a/ without spoken dialogue (silent movies, often animated movies, but also AAA movies like Baraka),
b/ with dialogue spoken in a language not supported by the TMDb or even in a language not listed in ISO 639-1 at all (e.g. Romani),
c/ obviously spoken in a language other than the intended language of the film as delivered to its target audience – e.g. a Czech documentary about Mongolian culture that contains only Mongolian dialogue with no Czech spoken word (subtitled in Czech), but with Czech title and released only in the Czech Republic for the Czech audience).
Trying to use a "spoken language" as a "original movie language" in any of the above cases prevents one of the basic purposes of the "original movie language" – to serve as a "default translation" to be paired with one of the supported TMDB translations. These cases cover a minor, but still quite substantial, portion of the world's movie production. For this reason, strictly using "spoken language" as the "original language" is highly impractical.
Conclusion/
It must be stressed that although the "original movie language" is not automatically the same as the "spoken language" or the language of the movie title, in 99% of cases it naturally corresponds to them. Thus, the problem with the definition of the "original movie language" is only in the remaining 1% of cases where it differs from the "spoken language" or the language of the movie title for some reason.
In these edge cases, it is up to the user (and ultimately the mod) to determine the correct "original movie language" using the above principles. In these cases, I think most mods have their own key, which, however, overwhelmingly leads to the same result.
For example, I have personally observed a strong coincidence between the on-screen credits language and the "original movie language" as implied by the rules above. This is, in a way, an extension of the principle that "original movie language" should correspond to the language of the movie title, but it takes into account cases where: a/ the use of a foreign language title (e.g. English) is clearly on artistic purpose and is not intended to imply that the movie is made in that language, b/ the movie title is language neutral (e.g. it is a common name like "Adam"). However, this is only my personal guide, which I apply only to Czech and Slovak content and to which I make very rare exceptions when justified.
I don't feel at all qualified to comment on the French content or edit it in any way. But generally – the film Lucy is 100 % French production, so it's local release could be only in France. "Original version of the film" in this case is the first non-festival release in France, which happened on 2014-08-06. The "original film language" should therefore be based on this version of the film. I have not seen the film, but imo it is quite likely that this version was:
a/ shown with French spoken dialogue,
b/ released with the language neutral title "Lucy"
c/ shown with opening and closing credits in French.
The above assumptions need to be confirmed by someone who understands this field better, but if they are valid, then I understand that a mod set its "original movie language" to French.
Personally, I would not put much importance on whether and to what extent some of the dialogue in the film was dubbed. "Original movie language" isn't automatically based on the "spoken language" (for the practical reasons mentioned above) and it is also quite common that a movie is provided with postsyncs that may be spoken by a person other than the actor. But that's just my opinion, because dubbed movies are in Czech almost always accompanied with the original subtitled version, so I've never actually had to make a decision like this (i.e. the few Czech productions shot in English were always locally released either in English or both in English and Czech and I've always set their "original movie language" to English).
I don't know anything about the film, but for me personally, the fact that the original version of the film has a language neutral title and French credits (i.e. the director is listed as "Réalisateur") would be reason enough.
How do you find out the language of the original script? Script is something that is not normally available, all we have is the final film and sometimes promo materials. And how is the language of the script better than the language of on-screen credits (if I may advocate my own system)? As for intertitles - not all movies have them, because intertitles were a thing of the silent movie era. And if you take the idea of intertitles to its logical conclusion, you will end up with on-screen credits anyway...
Overall, I can't help the impression that any system you propose is merely to justify why to primarily rely on the "spoken language", because there are edge cases where our "original movie language" doesn't meet your expectations. However, I don't think that this has ever been the goal of the "original movie language" field. This field has a given technical purpose in the database (hence the requirement for it to serve as a sort of "default translation", since the "original movie language" determines what TMDb translation should use the "original title" as a default and such a translation should have its "translated title" field locked blank). The fact that this field can also be used with very good effect to filter movies by their language is, after all, just a very nice side effect. Other databases do not contain such a field at all. IMDb allows you to filter based on the production country, other databases may allow you to filter based on spoken language, but they don't take into account all the annoying cases where spoken language cannot be used (see above).
So in the end I personally think, that our current system is the best compromise we can have. If anyone has a problem with the "original movie language" for any particular film, then they should properly justify their objection and challenge it within the report. The fact that some particular "original movie language" does not meet the expectations of most users (even if the mod made a mistake and that "original movie language" is actually incorrect) is not a reason to rework the entire system.
Reactie van takeshi2010
op 29 juli 2024 om 5:33 PM
Again, thank you for the thoughtful reply. I think there's been a couple misunderstandings:
original_language
field. If we say Lucy is English, we're true to the meaning of the name of the field. If we say French, we introduce the notion of country of origin to a language field and, because it's a language field, we choose the language of the country of origin. This seems contrived at best. So yes, Either Lucy is English or that field's name is very ambiguous and the documentation doesn't clarify the ambiguity. Although I did question (in my first post) why it was named like this in the first place if the idea was to put a different concept in it, I'm obviously not saying we should rename the field and break the API. But then either the data needs to match the field name unambiguously or the documentation should be clearer.I would agree with you if there wasn't already multiple reports open for these movies, all shot down with the same reason, while the reasons given by the users are not acknowledged. If people keep opening reports, it's probably because the mod-given reason doesn't seem justified. Some even say so. But that point is never acknowledged either. The same reason is given and treated as if it was the rule (despite the fact that it is not in the rule). It really feels like there is no discussion to be had in these reports. I wouldn't have opened this thread here, if I felt the discussion was open there.
I hope I've managed to get my point across. I mean no disrespect to the work the mods do. But I also know a rule that is open to interpretation leads to these weird corner cases. Clarifying the rule will help everyone in the long run. And if the clarification ends up being "whatever language was spoken in the country of origin of the main production companies", then, however arbitrary that seems to me, I'll shut up and eat my broccoli :)
Reactie van talestalker
op 30 juli 2024 om 4:38 AM
This is getting a little out of proportion with its scope, so I'd like to put an end to it. I was responding to your original question mainly because it was generally about what fields from the TMDb API to use to filter content by its language (this is something I do daily when dealing with Czech and Slovak content, and I can help with) and why it's not a good idea to rework the current definition of original movie language. I'm able of talking Czech and Slovak content to death with you, but I certainly don't want to spend my time discussing an area I only understand superficially, which is French cinema. So I'm just making a few quick additional comments below.
Feature films in theatres where there is some possibility of finding a script make up (rough estimate) about 20-25% of the database. Of this, non-English content accounts for half. For Czech and Slovak content I can reliably say that out of about 10k items I am able to find the script for a few dozen, and that's only if I go to a stone library. Script language may be a supporting argument for determining "original movie languge", but it is certainly highly impractical to refer to it in general.
It will never be possible to establish a 100% applicable rule. It's just inevitable that users and mods have to use reason for a certain percentage of films.
I really can't talk about Lucy or any French film. I'm just pointing out that according to our data (list of production companies) it is a 100% French production and therefore the French local release should be used to determine "original movie language" and "original title".
Reactie van talestalker
op 31 juli 2024 om 6:46 AM
FYI, parallel to our discussion, a question was raised on the mod forum, which clarified a few principles for setting "original movie language" and Lucy has it set back to English. This is based on the exception for films produced in English in a non-English speaking country, based on which the first non-festival release in an English speaking country is considered the "original release" (US release in the case of Lucy).