Foro » Feedback and Ideas

Mapping of fullwidth latin characters to plain ASCII

 
    • jesus2099 escribió...
    • Usuario
    • 7 Ago 2007, 13:33

    Mapping of fullwidth latin characters to plain ASCII

    It would be interesting to automatically map the fullwidth and halfwidth characters to their ASCII counterparts.

    "abc", "ABC" and "123" to "abc", "ABC" and "123"…

    There are also "()<>[]" to match to "()<>[]"…
    There are a lot of characters which could be automatically mapped this way without changing their shape nor signification IMHO.

    So many japanese tracks and artists wouldn't be duplicates anymore (without any administration task asked):

    少年S = 少年S
    "S" = "S"

    ラッツ&スター = ラッツ&スター
    "&" = "&"

    愛 ありがとう = 愛 ありがとう
    " " = " "

    プラットホーム~Merry Go Round~ = プラットホーム~Merry Go Round~
    "~" = "~"

    I think this would be harmless. Automatic and error free (or so it seems).


    PS. Is the search function of this forum really working?

    LA TÉLÉ FAIT GROSSIR ET NUIT À L'ÉVEIL DU CERVEAU
    Editado por jesus2099 el 7 Ago 2007, 13:54
    • DFA1979 escribió...
    • Suscriptor
    • 7 Ago 2007, 13:39
    I don't see any problems with the main suggestion there, although I don't know if there may be reasons it wouldn't work too well. As for the search function: no, it's useless. Just like the search function on every web forum I know :-/

  • I think this would be a great implementation and something to work on. I too have noticed that this occurs semi-often. It would clear a lot of duplicates up in different languages other than English (genres that I frequent often span over languages).

    I am not sure how much this would take. The search function isn't entirely useless, and I see how this implementation could actually help the search function. Searching something and then seeing something nearly exactly like it but with fullwidth characters has happened more than once for me. Usually I go with the option that is not fullwidth, as it normally has more users and is correct.

  • I'm not a programmer but is this able to be setup to work along the same lines as matching capitalization?
    As in how "Track" matches to "track".

    All comments reflect the views of the poster and not of last.fm or it's management.

  • That is truly a good question.
    I wonder if it'd have to manually be gone through and each fullwidth character would be assigned to it's non-fullwidth character. Especially with Asian languages this could get tricky and tedious.

    • DFA1979 escribió...
    • Suscriptor
    • 8 Ago 2007, 11:05
    I'm no programmer either but I see no reason it couldn't be setup in that way, would just be mapping the codes for each full-width character to the corresponding plain one, it sounds simple enough at least.

    • jesus2099 escribió...
    • Usuario
    • 8 Ago 2007, 11:36
    DFA1979 said:
    As for the search function: no, it's useless. Just like the search function on every web forum I know :-/

    Well, I really like forum search function in general (phpbb, …) but with this one, I can never find anything. The problem is that, when a post is further the second page, it's quite certain to be forgotten for good.

    Kerensky97 said:
    I'm not a programmer but is this able to be setup to work along the same lines as matching capitalization?
    As in how "Track" matches to "track".

    Yes, I would think so.

    Carla_bruni said:
    I wonder if it'd have to manually be gone through and each fullwidth character would be assigned to it's non-fullwidth character. Especially with Asian languages this could get tricky and tedious.

    I counted 128 halfwidth asian characters (thanks to babelmap: ア「」ᄚᅲ… U+FF61 ~ U+FFDC, U+FFE8 ~ U+FFEE). I did not think of them at first but yes, why not.

    At first, the latin characters would be cool enough and they are only around 2×26 + 10 of them (abc, ABC, 012).

    LA TÉLÉ FAIT GROSSIR ET NUIT À L'ÉVEIL DU CERVEAU
  • So we're looking at 190 total for both Asian and Latin? I wonder if the amount of time that would take is too tedious for the outcome to be plausible. But it sounds quite simple enough. I believe it would also clear up some duplicate problems, which is always a good thing.

Los usuarios anónimos no pueden escribir mensajes. Para participar en los foros inicia sesión o crea una cuenta.