Assistance de TMDB

When searching for a movie via the API it is useful to apply Unicode normalization to the search query.

For example the Movie "Zurück in die Zukunft 1" can have multiple encodings of the character "ü":

  • Use the character "LATIN SMALL LETTER U WITH DIAERESIS"
  • Use a two character encoding with the letters "u" (LATIN SMALL LETTER U) and " ̈" (COMBINING DIAERESIS)
  • Some other possible encodings...

When I use the encoding with two Unicode characters for the "ü" the API returns no result: http://api.tmdb.org/3/search/movie?api_key=f7f51775877e0bb6703520952b3c7840&query=Zuru%CC%88ck%20in%20die%20Zukunft%201&year=&language=de

When I use the single character "ü" the API behaves as expected and returns one result: http://api.tmdb.org/3/search/movie?api_key=f7f51775877e0bb6703520952b3c7840&query=Zur%C3%BCck%20in%20die%20Zukunft%201&year=&language=de

This is currently a problem on Mac OS X/ macOS in combination with Kodi because they use filenames in the denormalized form for the API lookup. I filed a separate bug report there: http://trac.kodi.tv/ticket/17308 Because the Kodi bug report is only visible to logged in users I duplicated the content here:


All builtin library scanners have problems with filenames containing umlaut characters.

Example:

Name a Movie "Zurück in die Zukunft 1" (Back to the Future).
Add the containing folder to the library
Look in the event log 

The event log has an entry:

Video library scanner Failed to scan movie: Zurück in die Zukunft 1.mkv

The relevant part of the kodi.log is: ...

When I examine the opened URIs the problem becomes clear. There are multiple ways to encode the character "ü" in unicode. The called URI from the scraper uses the string "Zurück" with the u encoded as two characters. See this Unicode inspector for an example: ​http://apps.timwhitlock.info/unicode/inspect?s=Zuru%CC%88ck

The two character encoding is not understood by the tmdb api thus the movie is not found. Maybe one could do some kind of Unicode normalization?

This is a well known problem on Mac OS X/macOS (see: ​http://apple.stackexchange.com/questions/83935/unicode-normalization-for-filenames-and-copied-text-from-pdfs).

QUOTE: HFS+ requires filenames to be in decomposed form (LATIN SMALL LETTER A + COMBINING DIAERESIS) instead of composed form (LATIN SMALL LETTER A WITH DIAERESIS).

2 réponses (sur la page 1 sur 1)

Jump to last post

Hi @Pendistic Thanks for the detailed report. Unfortunately, I don't have any time in the near future to look into this. There are multiple layers to this which is why it's complicated. For now all I can say is we only support the single character encoding and it should be the clients job to normalize it to a format we support.

It would be nice to improve one day, no doubt about it but I don't see me having time to look into this in the very near future.

You are right that it's the clients job to normalize. Thank you for the quick reply.

Un film, une émission télévisée ou un artiste est introuvable ? Connectez-vous afin de créer une nouvelle fiche.

Général

s Mettre le curseur dans la barre de recherche
p Ouvrir le menu du profil
esc Fermer une fenêtre ouverte
? Ouvrir la fenêtre des raccourcis clavier

Sur les pages des médias

b Retour (ou vers le parent si faisable)
e Afficher la page de modification

Sur les pages des saisons des émissions télévisées

Afficher la saison suivante (flèche droite)
Afficher la saison précédente (flèche gauche)

Sur les pages des épisodes des émissions télévisées

Afficher l'épisode suivant (flèche droite)
Afficher l'épisode précédent (flèche gauche)

Sur toutes les pages des images / photos

a Ouvrir la fenêtre d'ajout d'image / photo

Sur toutes les pages de modifications

t Ouvrir le sélecteur de traduction
ctrl+ s Envoyer le formulaire

Sur les pages des discussions

n Créer une nouvelle discussion
w Basculer le statut de suivi
p Basculer publique / privée
c Basculer fermer / ouvrir
a Ouvrir l'activité
r Répondre à la discussion
l Afficher la dernière réponse
ctrl+ enter Envoyer votre message
Page suivante (flèche droite)
Page précédente (flèche gauche)

Paramètres

Vous souhaitez évaluer ou ajouter cet élément à une liste ?

Connexion