机器学习和生物信息学实验室联盟
标题:
使用tensorflow编写MLP分类器
[打印本页]
作者:
guojiasheng
时间:
2016-7-26 19:27
标题:
使用tensorflow编写MLP分类器
本帖最后由 guojiasheng 于 2016-8-13 21:28 编辑
MLP:多层神经网络
MLP,算是在ANN里面比较常见的算法。多层神经网络,其实就是三层,输入层、隐层,输出层。不在所谓的深度学习行列之内。
Tensorflow:是谷歌开源的一个机器学习框架,主要用来做一些深度学习之类,当然我们可以借助平台的API编写普通的那些分类器比如:线性回归、逻辑回归,MLP这样的。
下面是实验的一些代码,包括数据的读人loadData,学习参数的定义HyperParamsConfig,分类器MLP,分类器的训练classifer():
可以看到其实只要30行左右的代码,就可以写出MLP算法。
我这边放了相应的训练数据,以及运行输出情况:
0 auc rate: 0.151080259165 loss: 0.690847
1 auc rate: 0.876808885537 loss: 0.303999
2 auc rate: 0.876960498924 loss: 0.294305
3 auc rate: 0.876822668572 loss: 0.288582
4 auc rate: 0.877036581278 loss: 0.284765
5 auc rate: 0.877435737977 loss: 0.282044
6 auc rate: 0.877720771144 loss: 0.28001
数据格式大家可以参照代码就知道,第一列为标签,其他列为特征,以“\t"隔开。
1 0 0 0.0 3.43398720449 4.39444915467 0.0 0.0 0.0 48.7500 105.1875 9.6667
1 0 0 1.79175946923 7.50769007782 6.82328612236 2.63905732962 4.42159069547
0 0 0 0.0 2.3978952728 3.55534806149 0.0 0.0 0.0 8.2500 17.6875 2.0000 20.0000 0.7273
import numpy as np
import tensorflow as tf
from sklearn.metrics import roc_auc_score
from sklearn import metrics
from sklearn.cross_validation import KFold
def dense_to_one_hot(labels_dense,num_classes=2):
""" convert class lables from scalars to one-hot vector"""
labels_dense = np.asarray(labels_dense)
num_labels = labels_dense.shape[0]
index_offset = np.arange(num_labels)*num_classes
labels_one_hot = np.zeros((num_labels, num_classes))
labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
return labels_one_hot
def kfold(trainData,trainClass,nFold=10):
skf = KFold(len(trainData),nFold,shuffle=True,random_state=1234)
kDataTrain = []
kDataTrainC = []
kDataTest = []
kDataTestC = []
trainData = np.asarray(trainData)
trainClass = np.asarray(trainClass)
for train_index,test_index in skf:
X_train,X_test = trainData[train_index],trainData[test_index]
y_train,y_test = trainClass[train_index],trainClass[test_index]
kDataTrain.append(X_train)
kDataTrainC.append(y_train)
kDataTest.append(X_test)
kDataTestC.append(y_test)
return kDataTrain,kDataTrainC,kDataTest,kDataTestC
def load_data(fileName):
lables = []
feature = []
for line in open(fileName):
if line.startswith("@") or line == "":
continue
listV = line.strip().split(",")
feature.append(listV[0:-2])
lables.append(int(listV[-1]))
return lables,feature
#return dense_to_one_hot(lables),np.asarray(feature)
ty,tx = load_data("feature_91.arff")
kDataTrain,kDataTrainC,kDataTest,kDataTestC = kfold(tx,ty)
acc =[]
for index in range(len(kDataTrain)):
print "cross validation:",index
ty,tx = kDataTrainC[index],kDataTrain[index]
testy,testx = kDataTestC[index],kDataTest[index]
ty = dense_to_one_hot(ty)
testy = dense_to_one_hot(testy)
learning_rate = 0.0005
training_epochs = 500
batch_size = 100
n_hidden_1 = 300
n_hidden_2 = 300
n_input = tx.shape[1]
n_class = 2
x = tf.placeholder("float",[None,n_input])
y = tf.placeholder("float",[None,n_class])
def mlp(x,weights,biases):
layer_1 = tf.add(tf.matmul(x,weights["h1"]),biases["b1"])
layer_1 = tf.nn.relu(layer_1)
layer_2 = tf.add(tf.matmul(layer_1, weights["h2"]), biases["b2"])
layer_2 = tf.nn.relu(layer_2)
out_layer = tf.matmul(layer_2,weights['out']) + biases['out']
return out_layer
weights = {
'h1': tf.Variable(tf.random_normal([n_input,n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1,n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2,n_class]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out':tf.Variable(tf.random_normal([n_class]))
}
pred = mlp(x,weights,biases)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
for i in range(training_epochs):
avg_cost = 0.
total_batch = int(tx.shape[0]/batch_size)
for start,end in zip(range(0,len(tx),batch_size), range(batch_size,len(tx),batch_size)):
_,loss = sess.run([optimizer,cost],feed_dict={x:tx[start:end],y:ty[start:end]})
avg_cost += loss / total_batch
#print i,"loss:",avg_cost
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(testy, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
result = accuracy.eval({x: testx, y: testy})
acc.append(result)
print "Accuracy:", result
print "cross validation result"
print "accuracy:",np.mean(acc)
复制代码
作者:
zouquan
时间:
2016-7-27 13:15
这个是python代码?服务器里需要有什么包吧?(sklearn?)
另外,不需要自己设定层数,每层节点数?自己就能优化?
输出是啥?混淆矩阵?还是优化的神经网络结构?还是训练的model?
作者:
guojiasheng
时间:
2016-7-28 13:15
zouquan 发表于 2016-7-27 13:15
这个是python代码?服务器里需要有什么包吧?(sklearn?)
另外,不需要自己设定层数,每层节点数?自 ...
1.需要numpy , sklearn ,tensorflow
2.我这个就三层,隐藏节点可以自己设置,其他就自己优化。
3.输出目前就是 roc值,其他的也都可以输出,混淆矩阵,网络 或者model。
作者:
shixiang
时间:
2016-7-29 22:47
赞赞赞,mark学习
作者:
maoyaozong
时间:
2016-8-4 11:39
{:141:}牛逼
作者:
guojiasheng
时间:
2016-8-13 21:28
新增代码说明:
(1)添加了交叉验证默认10fold
(2)可处理多分类,转换为one-hot格式
(3)可以输出acc
(4)训练数据为arff格式
(5)有两个隐层hidden_layer
欢迎光临 机器学习和生物信息学实验室联盟 (http://123.57.240.48/)
Powered by Discuz! X3.2