标题: Tag completion for image retrieval-TPAMI2012 [打印本页] 作者: zhupengfei 时间: 2016-11-16 01:18 标题: Tag completion for image retrieval-TPAMI2012 Tag completion for image retrieval-TPAMI2012作者: zhupengfei 时间: 2016-11-16 14:07
1 工作动机
Tag completion 针对标签缺失的问题,进行标签补全。主要思想是要求sample-level tag correlation和特征空间相似度相似,同时恢复出来的标签应该保持class-level label correlation。
[attach]3008[/attach]
2 模型
[attach]3009[/attach]
其中T是恢复出来的标签矩阵 V是特征矩阵 w是特征权重 T^是观察到的缺失标签矩阵
模型主要包含三个部分
||TT^T-VV^T||_F^2是指样本在特征空间的相似度与在tag空间的相似度越接近越好;
||T^T T- R||_F^2 是指恢复出来的标签矩阵应该是保持原始标签矩阵中的co-occurrence关系,比如蓝天和白天两个类别的关联性很强 那么恢复出来的标签矩阵也应该保持这种关系。
||T-T^||_F^2是指恢复出来的T和原始的T^也应该保持相似关系。
因为每个样本只属于某几个Tag 因此T需要稀疏, w作为特征的权重 也有特征选择的作用 因此对w加入了稀疏正则。
3 优化
subgradient descent based approach 迭代求解T和w
subgradient如下
[attach]3010[/attach]
更新过程
[attach]3011[/attach]
由于直接计算梯度会造成每一次得到T很dense,增大计算复杂度,因此作者提出使用composite function optimization的方法。
首先求解以下问题得到最新的T和w
[attach]3012[/attach]
之后求解下列的sparse coding问题,具体求解可以采用soft thresholding的方法
[attach]3013[/attach]
关于收敛性,这段解释可以在以后的论文写作中follow下
Although the proposed formulation is non-convex and therefore cannot guarantee to find the global optimal, this however is not a serious issue from the viewpoint of learning theory [5]. This is because as the empirical error goes down during the process of optimization, the generalization error will become the leading term in the prediction error. As a result, finding the global optima will not have a significant impact on the final prediction result. In fact, [51] shows that only an approximately good solution would be enough to achieve similar performance as the exact optimal one. To alleviate the problem of local optima, we run the algorithm 20 times and choose the run with the lowest objective function.
4实验
数据
• Corel dataset [43]. It consists of 4, 993 images, with each image being annotated by at most five tags.
There are a total of 260 unique keywords used in this dataset.
• Labelme photo collection. It consists of 2,900 online photos, annotated by 495 non-abstract noun
tags. The maximum number of annotated tags per image is 48.
• Flickr photo collection. It consists of one million images that are annotated by more than 10,000 tags. The maximum number of annotated tags
per image is 76. Since most of the tags are only used by a small number of images, we reduce the vocabulary to the first 1, 000 most popular tags used in this dataset, which reduces the database to 897,500 images.
• TinyImg image collection. It consists of 79,302,017 images collected from the web, annotated by 75,062 non-abstract noun tags. The maximum
number of annotated tags per image is 82. Similar to the Flickr photo collection, we reduce the vocabulary to the first 1, 000 most popular tags
in the dataset, which reduces the database size to 997,420 images
对比算法:
(i) Multiple Bernoulli Relevance Models (MBRM) [50] that models the joint distribution of annotation tags and visual features by a mixture distribution, (ii) Joint Equal Contribution method (JEC) [4] that finds appropriate annotation words for a test image by a k nearest neighbor classifier that combines multiple distance measures derived from different visual features,
(iii) Inference Network method (InfNet) [17] that applies the Bayesian network to model the relationship between visual features and annotation words, (iv) Large scale max-marin multi-label classification (LM3L) [8], that overcomes the training bias by incorporating correlation prior,
(v) Tag Propagation method (TagProp) [38] that propagates the label information from the labeled instances to the unlabeled instances via a weighted nearest neighbor graph,
(vi) social tag relevance by neighbor voting (TagRel) [53], that explores the tag relevance based on a neighborhood voting approach.
实验设置
评价指标 Mean Average Precision
Single-tag Queries and queries with multiple tags