Tag completion for image retrieval-TPAMI2012

zhupengfei · 发表于 2016-11-16 01:18:05

zhupengfei · 发表于 2016-11-16 14:07:51

1 工作动机
Tag completion 针对标签缺失的问题，进行标签补全。主要思想是要求sample-level tag  correlation和特征空间相似度相似，同时恢复出来的标签应该保持class-level label correlation。

2 模型

其中T是恢复出来的标签矩阵  V是特征矩阵  w是特征权重  T^是观察到的缺失标签矩阵
模型主要包含三个部分
||TT^T-VV^T||_F^2是指样本在特征空间的相似度与在tag空间的相似度越接近越好；
||T^T T- R||_F^2 是指恢复出来的标签矩阵应该是保持原始标签矩阵中的co-occurrence关系，比如蓝天和白天两个类别的关联性很强  那么恢复出来的标签矩阵也应该保持这种关系。
||T-T^||_F^2是指恢复出来的T和原始的T^也应该保持相似关系。
因为每个样本只属于某几个Tag 因此T需要稀疏， w作为特征的权重也有特征选择的作用因此对w加入了稀疏正则。

3 优化
subgradient descent based approach  迭代求解T和w
subgradient如下

更新过程

由于直接计算梯度会造成每一次得到T很dense，增大计算复杂度，因此作者提出使用composite function optimization的方法。
首先求解以下问题得到最新的T和w

之后求解下列的sparse coding问题，具体求解可以采用soft thresholding的方法

关于收敛性，这段解释可以在以后的论文写作中follow下
Although the proposed formulation is non-convex and therefore cannot guarantee to find the global optimal, this however is not a serious issue from the viewpoint of learning theory [5]. This is because as the empirical error goes down during the process of optimization, the generalization error will become the leading term in the prediction error. As a result, finding the global optima will not have a significant impact on the final prediction result. In fact, [51] shows that only an approximately good solution would be enough to achieve similar performance as the exact optimal one. To alleviate the problem of local optima, we run the algorithm 20 times and choose the run with the lowest objective function.
4实验
数据
• Corel dataset [43]. It consists of 4, 993 images, with each image being annotated by at most five tags.
There are a total of 260 unique keywords used in this dataset.
• Labelme photo collection. It consists of 2,900 online photos, annotated by 495 non-abstract noun
tags. The maximum number of annotated tags per image is 48.
• Flickr photo collection. It consists of one million images that are annotated by more than 10,000 tags. The maximum number of annotated tags
per image is 76. Since most of the tags are only used by a small number of images, we reduce the vocabulary to the first 1, 000 most popular tags used in this dataset, which reduces the database to 897,500 images.
• TinyImg image collection. It consists of 79,302,017 images collected from the web, annotated by 75,062 non-abstract noun tags. The maximum
number of annotated tags per image is 82. Similar to the Flickr photo collection, we reduce the vocabulary to the first 1, 000 most popular tags
in the dataset, which reduces the database size to 997,420 images
对比算法:
(i) Multiple Bernoulli Relevance Models (MBRM) [50] that models the joint distribution of annotation tags and visual features by a mixture distribution, (ii) Joint Equal Contribution method (JEC) [4] that finds appropriate annotation words for a test image by a k nearest neighbor classifier that combines multiple distance measures derived from different visual features,
(iii) Inference Network method (InfNet) [17] that applies the Bayesian network to model the relationship between visual features and annotation words, (iv) Large scale max-marin multi-label classification (LM3L) [8], that overcomes the training bias by incorporating correlation prior,
(v) Tag Propagation method (TagProp) [38] that propagates the label information from the labeled instances to the unlabeled instances via a weighted nearest neighbor graph,
(vi) social tag relevance by neighbor voting (TagRel) [53], that explores the tag relevance based on a neighborhood voting approach.
实验设置

评价指标 Mean Average Precision
Single-tag Queries and queries with multiple tags

5 启发和感悟
这篇论文发表在2013年，投稿应该是2011年，曾经在sparse比较火热的时候有不少follow的paper发表在CVPR ECCV等会议上。这种类型的工作第一步需要确定如何恢复标签，第二步嵌入相应的先验信息，比如样本的特征空间相似性，标签的样本层次相似性、类别层次相似性。万变不离其踪，关键在于如何填坑。这个topic应该还可以发A类的文章，但是很难再有高引用论文出现。

		自动登录	找回密码
密码			注册

Tag completion for image retrieval-TPAMI2012

本帖子中包含更多资源

本帖子中包含更多资源