机器学习和生物信息学实验室联盟

标题: libsvm Chih-Jen Lin Some Thoughts on Large-scale Data Classi [打印本页]

作者: Genie 时间: 2012-10-24 15:02
标题: libsvm Chih-Jen Lin Some Thoughts on Large-scale Data Classi
今天有幸听了Chih-Jen Lin的一场报告，主要关于大数据分类，指出了人们对分类任务认识的一些误区，以及分布式环境(MPI/mapreduce)做分类，未来大数据分类的发展方向，推荐阅读下

[attach]1084[/attach]

作者: cwc 时间: 2012-10-24 15:51
据说Andrew Ng下周要去百度做报告了，羡慕嫉妒恨那

作者: chenwq 时间: 2012-10-24 19:34
"A framework is like a language or a specification. You can then have different implementations"

"let problems drive the tools"

作者: Genie 时间: 2012-10-24 21:45

chenwq 发表于 2012-10-24 19:34
"A framework is like a language or a specification. You can then have different implementations"

...

反复强调的“Focus on ease of use”

作者: Genie 时间: 2012-10-24 22:50

chenwq 发表于 2012-10-24 19:34
"A framework is like a language or a specification. You can then have different implementations"

...

The Hadoop or MapReduce are not designed in particular for machine learning application,and we need know when and where are suitable to be used.Why Hadoop is insufficient for iterative algorithms？It have expensive Disk IO use.

作者: twinsken 时间: 2012-11-22 12:10
mahout社区曾经就这个问题争论了好久，最后依旧坚持用hadoop hdfs，因为业界广泛应用，spark，puma之类还很不成熟
ps：求下载

作者: Genie 时间: 2012-11-23 14:51
请问是啥问题争论了好久？？Spark已有公司在用，国内公司永远比别人慢一拍

作者: twinsken 时间: 2012-11-26 00:35
大规模机器学习算法的开发非常依赖于并行的计算框架，hadoop当初并不是为了机器学习设计的，有人希望能解耦合，能适应多种下层的计算模式

欢迎光临机器学习和生物信息学实验室联盟 (http://123.57.240.48/)