Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Am I right in thinking that a naive Bayes classifier is beyond "not even the best out there," and is in fact about as simple a learning algorithm as you can get, and straight out of AI 101?


Pretty much, yes. (Though that doesn't mean it's not a good technique. Lots of quite effective spam filters are more or less naive-Bayes.)


They're sometimes a good technique only because some problems are really simple. There are almost no problems where the extreme independence assumptions of naive Bayes create a reasonable likelihood function. The consequence ends up that when it's wrong, it tends to be very very certain that it's right. I think the aphorism that gets passed around is "Naive Bayes classifiers are often in error but never uncertain".


Yup. But some problems -- for instance, discriminating between spam and non-spam emails, and keeping up decent discrimination as spammers vary their tactics -- are (1) "really simple" in that sense and (2) apparently quite difficult to solve, given that there basically were no really effective spam filters before naive-Bayes ones came along.


We use a modified naive bayes extensively in a commercial application -- from what I understand it's extremely quick to classify, easy to modify/customize, and deals very well with gaps in data. For a lot of applications, things like SVM and WAODE are only minor incremental improvements.


Partly this is because naive Bayes's unreasonable independence assumptions (which are almost always badly violated) turn out not to actually hurt classification performance in a lot of cases, even in theory, because under a lot of distributions the independence violations basically cancel out: http://www.aaai.org/Papers/FLAIRS/2004/Flairs04-097.pdf


Naive Bayes classifiers are simple, (relatively) easy to understand and fairly straight-forward to code.

They are also fairly robust and work well for a wide variety of problem sets.

Other techniques sometimes offer some improvement, but often don't. Generally a Bayesian classifier is a good place to start.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: