Data mining musical profiles

RSS
Compartir

2 Abr 2007, 22:06

Here's a preliminary data mining analysis of a sample population of Last.fm users. An automated classification into clusters or sub populations with related musical genres reveals some of the structure of musical preferences among the users in a relatively large sample population. Musical tag clouds are adopted to characterise users and populations, which adds a highly descriptive value and aids with the interpretation of the results.

Here's a figure from the article, displaying all users in the sample in their colour-coded clusters:



Read more at http://anthony.liekens.net/static/DataMiningLastFm.html.

Comentarios

  • illya23b

    Interesting article, makes me want to learn the last.fm API :) Good call on using artist tags! My biggest question I guess was how representative/unbiased you think your user sample was; walking the graph of friends and neighbors seems likely to keep you in similar musical waters. Is there a simple way to get random last.fm users? Also, out of curiosity, do you have the coordinates in tag-space of the principle component vectors? How many do you need to explain, say, 90% of the variance? Lack of classical/jazz could be a demographics thing, or could have to do with the average length of jazz and classical tracks, so that jazz and classical artists fall off of the top-50 charts of mixed-genre listeners. Also a problem with post-rock and electronica, but not to the same degree. Classical also suffers from highly inconsistent tagging. What did you write your code in? Great work!

    7 May 2007, 19:34
  • aliekens

    I have had multiple questions with request to my initial bias. However, I can check whether my sample is biased, and indeed: there is no significant difference between my average tag vector (in my sample) compared to the top tags in last.fm (for the whole last.fm population). I don't really remember, but I think you need about 15 principal components to have 95% of the initial variance (but that was just a quick plot I made to check this myself). So exploring only the first 4 principal components can indeed be considered insufficient. I think jazz and classical music are simply underrepresented in the whole last.fm community. For the data mining experiment, I used a mix of bash, C++ and matlab code. Most of the scripts to generate user tag clouds were rewritten in PHP and can be found at http://anthony.liekens.net/pub/scripts/last.fm/ Thanks for the peer-reviewing! Anthony,-

    7 May 2007, 22:20
Ver todos (2)
Dejar un comentario. Entra en Last.fm o regístrate (es gratis).