Skip to main content
Welcome guest. | Register | Login | Post

Polish Your GNU/Linux

Syndicate content
All About GNU/Linux and BSD - reviews, comparisons, articles
Updated: 8 weeks 6 days ago

Open source film recommendation engine from Filmaster.com

Fri, 2010-05-21 23:53

Filmaster.com, a social network for film lovers, has recently presented a new movie recommendations engine. The algorithm that generates recommendations is open source and has been released under AGPLv3 license.

How does it work?

The new algorithm works by fetching the current ratings from database, processing them using a teaching algorithm and then generates recommendations for each user/film pair and eventually stores them in the relational database (PostreSQL). The first and the last part of the process is obviously Filmaster-specific. The teaching algorithm, on the other side, is universal and can be easily used in any external open source project.

The recommendation engine has been designed and implemented based on the best practices of the Nexflix contest participants. The correctness of recommendations (computed using the recommendation engines test, is — according to Filmaster — almost 20% better than in case of the previous algorithm used. Precisely, the RMSE value based on ~250 thousands ratings is as follows:
- 1.55 for the old algorithm
- 1.30 for the new one

Technical details

Here is how the programmers document how the algorithm works.

Movies have a number of different features, each of those can also be rated (e.g. the scripts or the level of violence, etc). Obviously every user has a different attitude to those features. One may enjoy violence, the other might not stand it. So we can also rate users’ preferences for each feature. Here we get our U and M matrices where f_num dimension is the number of features we consider. Notice that in R' every guessed rating (u,m) is now of form: sum of U[u][f]*M[f][m] for every feature f. So the higher the preference for feature f is and the higher is its level in a movie, the higher guessed rating will be. We also need to scale it so it would be in a <1,10> range. So all that needs to be done is to find the possibly best U and M matrixes.

None of the existing SVD algorithms seemed to fit, so teaching approach with a heuristics has been used instead. We start with both matrices U,M filled with the same unsignificantly small real number in each entry. We will now modify it separately for each feature. We begin with teaching the first feature and then after a certain number of teaching cycles we proceed to the second etc. In a teaching cycle we compute the actual R' matrix and then take all nonempty entries (u,m) in R matrix and for each of them we use a given formula:

err = lrate * (R[u][m]-R'[u][m]) U[u][f] += err*M[f][m] M[f][m] += err*U[u][f]

where lrate is a constant real number and f is a number of presently computed feature.

We notice that when our guessed rating is too low, err will be positive so the entries in U and M will increase. In the other case, err is negative and entries decrease. During the next teaching cycles the err absolute value decreases, as our guess ratings are getting closer to the real ones. In the end going through more cycles does not have significant effect so we proceed to the next feature. This simple formula happens to give quite satisfactory results.

Who stands behind the algorithm?

The algorithm was implemented solely by Jakub Tla?ka, currently a mathematics and computer science student on University of Warsaw and part-time Filmaster developer.

Despite the fact that Filmaster is coded in Python using the Django Web Framework, the recommendation engine has been implemented in C++ for performance reasons. It outperforms the previous one (written in Python) by an order of magnitude.

The code and the documentation Try it yourself!

You can test the film recommendation engine yourself by rating at least 20 movies on Filmaster (an account is required - Open ID and Facebook Connect can be used as well) and then switching to the film recommendations page to see the suggestions.

Alternatively, you can use the open source code to apply the algorithm on your own collection of data in any open source project you’re currently working on. If you do so or if you’d like to help enhancing the current alorithm, please leave a comment under this article and write to Filmaster project maintainers at filmaster@filmaster.com.

The author of this article is one of the founders of the Filmaster project, but not the author of the new algorithm.

» Post Purchase:  Text Link
Categories: Polish Your GNU/Linux

Stefano Zacchiroli (Debian Leader): Local communities are important

Tue, 2010-05-18 19:24

Stefano Zacchiroli, a new Debian Project Leader, in an interview with Polish Debian Portal speaks about this year’s campaign, realase policy of the project, the Debian GNU/kFreeBSD port and other non-linux ports, and the role of local Debian communities.

Thanks to azhag, we’re happy to publish the interview for you!

Stefano Zacchiroli’s term began on April 17. During last month he has already appoint several delegates, took part — as th DPL — in several discussions, and gave a talk in UDS conference about Debian-Ubuntu relationship.

He took some time out to speak to us about some aspects of Debian.

First of all: congratulation on being elected. Was it a hard campaign? How was it different from last year’s?

Thanks, I feel honored of the result. I’m not really able to judge whether it has been “hard” or not: the most interesting part of DPL campaigning is not really the challenge, but rather that it is the moment where the Debian project discusses of its “politics”, something which rarely happen in other occasions. For sure, by comparison with last year, I can tell that it has been a very intense campaign, as the volume of discussed topics shows.

In rebuttals you wrote that you like some of the opponents ideas. Do you plan to realize some of them, or let your opponents to realize them by themselves?

Sure, I’ve no preconceptions on others’ ideas, and I’d generally love to see implemented all ideas that can improve Debian, no matter who advances them. Regarding who is going to implement them, well, that doesn’t really matter and in my first “bits from the DPL” I’ve already made quite clear that anyone who needs some kind of “DPL blessing” to implement an idea can come to me. The DPL role is mainly about coordination and communication, the real hard work get done by all developers which decide to take responsibility on their shoulders.

During last DebConf Debian accepted a new release policy (actually, a freeze policy, but this leads to releases). Do you think it’s a good policy?

No, not really. That idea was presented during the Release Team talk and then retracted a few days after due to the disruptive effect that a very short development cycle would have add on many of our teams.

That said, all choices about release cycles belong mostly to the Release Team, which should discuss them with the rest of the project. Personally, I don’t see Debian going to a time-based release policy anytime soon, as we very much like to “release when it’s ready”; a value that other users appreciate to.

On the other hand, I find the idea of time-based freezes quite interesting: it enables teams to plan more accurately their development initiatives for a given release, without trading off our “when it’s ready” mantra. Again, that ought to be discussed at large in the project, though, and it should better be postponed after the release of Squeeze.

Debian GNU/kFreeBSD is a release architecture in Squeeze. What do you think about non-Linux ports of Debian? Are they just whims and waste of time and resources, or actually valuable projects?

I think they are very valuable projects. In general, Debian is one of the free software distribution with the largest number of ports (at least for what concerns Linux) and that makes of us a very valuable platform for both upstreams which care about portability and for users of non mainstream *NIX platforms.

The addition of any new port is something that increases our value and I’m particularly proud at the idea of having a non-Linux port in the next Debian stable release. Kudos goes to the kFreeBSD Debian porters and to all developers which have worked towards that goal!

Do you have any “bits from the DPL” to ordinary Debian users, especially those from Poland?

In particular for Poland, I’d like to thank the country for all the valuable Debian Developers and contributors which it has “given” to Debian. I really believe in local communities as the key place to attract new users and to teach them how cool is participating and getting involved in Debian. Keep up the good work, and remember that Debian will continue only if we will manage to explain to more and more people our values and have them join us!

Thank you very much. Wish you a fruitful term.

» Post Purchase:  Text Link
Categories: Polish Your GNU/Linux

Who's online

There are currently 0 users and 3 guests online.
We have 11739 members who wrote 2274 articles and 12410 comments. Welcome to our newest member, medfarmatic!