|
Apache Mahout是一个机器学习的框架,构建在hadoop上支持大规模数据集的处理,目前最新版本0.4。
Mahout currently has:
- Collaborative Filtering
- User and Item based recommenders
- K-Means, Fuzzy K-Means clustering
- Mean Shift clustering
- Dirichlet process clustering
- Latent Dirichlet Allocation
- Singular value decomposition
- Parallel Frequent Pattern mining
- Complementary Naive Bayes classifier
- Random forest decision tree based classifier
- High performance java collections (previously colt collections)
- A vibrant community
Highlights of it:
- Model refactoring and CLI changes to improve integration and consistency
- New ClusterEvaluator and CDbwClusterEvaluator offer new ways to evaluate clustering effectiveness
- New Spectral Clustering and MinHash Clustering (still experimental)
- New VectorModelClassifier allows any set of clusters to be used for classification
- Map/Reduce job to compute the pairwise similarities of the rows of a matrix using a customizable similarity measure
- Map/Reduce job to compute the item-item-similarities for item-based collaborative filtering
- RecommenderJob has been evolved to a fully distributed item-based recommender
- Distributed Lanczos SVD implementation
- More support for distributed operations on very large matrices
- Easier access to Mahout operations via the command line
- New HMM based sequence classification from GSoC (currently as sequential version only and still experimental)
- Sequential logistic regression training framework
- New SGD classifier
- Experimental new type of NB classifier, and feature reduction options for existing one
- New vector encoding framework for high speed vectorization without a pre-built dictionary
- Additional elements of supervised model evaluation framework
- Promoted several pieces of old Colt framework to tested status (QR decomposition, in particular)
- Can now save random forests and use it to classify new data
第五条就是我尝试做又没成功的地方啊 |
|