机器学习和生物信息学实验室联盟

 找回密码
 注册

QQ登录

只需一步,快速开始

搜索
查看: 4802|回复: 2
打印 上一主题 下一主题

Apache Mahout

[复制链接]
跳转到指定楼层
楼主
发表于 2011-7-28 10:28:19 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
Apache Mahout是一个机器学习的框架,构建在hadoop上支持大规模数据集的处理,目前最新版本0.4。

   Mahout currently has:
  • Collaborative Filtering
  • User and Item based recommenders
  • K-Means, Fuzzy K-Means clustering
  • Mean Shift clustering
  • Dirichlet process clustering
  • Latent Dirichlet Allocation
  • Singular value decomposition
  • Parallel Frequent Pattern mining
  • Complementary Naive Bayes classifier
  • Random forest decision tree based classifier
  • High performance java collections (previously colt collections)
  • A vibrant community


Highlights of it:
  • Model refactoring and CLI changes to improve integration and consistency
  • New ClusterEvaluator and CDbwClusterEvaluator offer new ways to evaluate clustering effectiveness
  • New Spectral Clustering and MinHash Clustering (still experimental)
  • New VectorModelClassifier allows any set of clusters to be used for classification
  • Map/Reduce job to compute the pairwise similarities of the rows of a matrix using a customizable similarity measure
  • Map/Reduce job to compute the item-item-similarities for item-based collaborative filtering
  • RecommenderJob has been evolved to a fully distributed item-based recommender
  • Distributed Lanczos SVD implementation
  • More support for distributed operations on very large matrices
  • Easier access to Mahout operations via the command line
  • New HMM based sequence classification from GSoC (currently as sequential version only and still experimental)
  • Sequential logistic regression training framework
  • New SGD classifier
  • Experimental new type of NB classifier, and feature reduction options for existing one
  • New vector encoding framework for high speed vectorization without a pre-built dictionary
  • Additional elements of supervised model evaluation framework
  • Promoted several pieces of old Colt framework to tested status (QR decomposition, in particular)
  • Can now save random forests and use it to classify new data


第五条就是我尝试做又没成功的地方啊
分享到:  QQ好友和群QQ好友和群 QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
收藏收藏 转播转播 分享分享
回复

使用道具 举报

沙发
发表于 2011-7-28 11:04:40 | 只看该作者
赞,有成功的代码么?发一个上来,看看
回复 支持 反对

使用道具 举报

板凳
发表于 2011-7-28 12:14:49 | 只看该作者
先大力搞hadoop把...机器什么都到位了。:D
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

机器学习和生物信息学实验室联盟  

GMT+8, 2024-11-23 13:30 , Processed in 0.134412 second(s), 19 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表