Apache Mahout

Fth-Hokage · 发表于 2011-7-28 10:28:19

Apache Mahout是一个机器学习的框架，构建在hadoop上支持大规模数据集的处理，目前最新版本0.4。

Mahout currently has:

Highlights of it:

Model refactoring and CLI changes to improve integration and consistency
New ClusterEvaluator and CDbwClusterEvaluator offer new ways to evaluate clustering effectiveness
New Spectral Clustering and MinHash Clustering (still experimental)
New VectorModelClassifier allows any set of clusters to be used for classification
Map/Reduce job to compute the pairwise similarities of the rows of a matrix using a customizable similarity measure
Map/Reduce job to compute the item-item-similarities for item-based collaborative filtering
RecommenderJob has been evolved to a fully distributed item-based recommender
Distributed Lanczos SVD implementation
More support for distributed operations on very large matrices
Easier access to Mahout operations via the command line
New HMM based sequence classification from GSoC (currently as sequential version only and still experimental)
Sequential logistic regression training framework
New SGD classifier
Experimental new type of NB classifier, and feature reduction options for existing one
New vector encoding framework for high speed vectorization without a pre-built dictionary
Additional elements of supervised model evaluation framework
Promoted several pieces of old Colt framework to tested status (QR decomposition, in particular)
Can now save random forests and use it to classify new data

第五条就是我尝试做又没成功的地方啊

zouquan · 发表于 2011-7-28 11:04:40

赞，有成功的代码么？发一个上来，看看

xmubingo · 发表于 2011-7-28 12:14:49

先大力搞hadoop把...机器什么都到位了。:D

		自动登录	找回密码
密码			注册