复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuvales-webinar... ·...

31
Tianjin University 复杂数据环境下的数据降维 朱鹏飞 天津大学计算机科学与技术学院 2016-10-26

Upload: others

Post on 06-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境下的数据降维

朱鹏飞

天津大学计算机科学与技术学院

2016-10-26

Page 2: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境

专刊题目:复杂环境下的机器学习研究

征稿范围:1、不确定性数据处理与建模2、面向多源异构复杂数据的机器学习3、机器学习在复杂任务中的应用

人们不再满足于场景固定、目标明确的学习任务,开始尝试开放环境下、复杂场景中的探索式学习、多任务协同学习等等更具挑战性的任务,并且在无人驾驶、机器人、大系统优化、大数据建模等场景下进行验证。为了应对这些挑战,有必要根据待建模任务的复杂性,提出更灵活、更鲁棒、更自主、自进化的学习机制。

Page 3: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

手机拍照使用率 2010 6% 2012 82% 2015 100%

大数据的高维性

Page 4: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

维数灾难

大数据的高维性

Page 5: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE CI Magazine, 2014

为什么特征选择

The evolution (rise) of feature dimensionality in correlation matrices. (a) Diabetes (8 features)

(b) Lung Cancer (56 features)(c) Psoriasis (529,651 features)

Page 6: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

为何特征选择

存储负担

计算复杂度

模型泛化能力

Page 7: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

为何特征选择

特征空间维度的增长,使得模型参数增加,模型求解复杂度增加,容易引起过拟合,从而影响模型的泛化性能

Page 8: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

样本稀疏性:某些特征空间基本没有样本存在

为何特征选择高维特征空间中的度量集中效应

在高维数据空间中,某个样本点到其最近邻居点和最远邻居点之间的距离趋于相等,从而导致一些基于距离度量的机器学习算法性能降低。这种现象通常称为“度量集中”,最早由Milman在描述高维概率分布时引入。

Page 9: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境

Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., & Tang, J., et al. (2016). Feature selection: a data perspective.

Page 10: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

多模态异构信息

复杂数据环境—多模态

Page 11: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

多模态异构信息

复杂数据环境—多模态

Page 12: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境—结构化

undirected graph structure

Ye J, Liu J. Sparse methods for biomedical data[J]. ACM SIGKDD Explorations Newsletter, 2012, 14(1): 4-15.

Tree group lasso

特征结构化

Page 13: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境—结构化

Jun Liu and Jieping Ye. Moreau-Yosida regularization for grouped tree structure learning. NIPS 2010

特征结构化

Page 14: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境—结构化

J. Tang and H. Liu. Feature selection with linked data in social media. In SDM , 2012.

Twitter (tweets linked through hyperlinks)

Facebook (people connected by Friendships)

Biological networks (protein interaction networks)

样本结构化

Page 15: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境—结构化

A part of the semantic hierarchy of Corel 5k

标签结构化

Wu B, Lyu S, Ghanem B. ML-MG: Multi-label Learning with Missing Labels Using a Mixed Graph[C]// IEEE InternationalConference on Computer Vision. IEEE, 2015.

Page 16: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境—缺失

Image annotation

AU recognition

标签缺失

Sun Y Y, Zhang Y, Zhou Z H. Multi-Label Learning with Weak Label.[C]// Twenty-Fourth AAAI Conference on Artificial

Intelligence, AAAI 2010, Atlanta, Georgia, Usa, July. 2010.

Page 17: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境—缺失

Recommendation system

Multi-view clustering

视角缺失

[1] Handong Zhao, Hongfu Liu, and Yun Fu, Incomplete Multimodal Visual Data Grouping, International

Joint Conference on Artificial Intelligence (IJCAI), 2016

Page 18: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境—噪声

Xiangyong Cao, Qian Zhao, Deyu Meng, Yang Chen, Zongben Xu. Robust Low-rank Matrix Factorization under

General Mixture Noise Distributions, IEEE Transactions on Image Processing, 2016.

Images from the Yale Face Database with different noises

属性噪声

Page 19: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境—噪声

标签噪声

人工标注或机器自动标注误差

Tongliang Liu, Dacheng Tao: Classification with Noisy Labels by Importance Reweighting. IEEE Trans. Pattern Anal. Mach.

Intell. 38(3): 447-461 (2016)

医疗诊断中的误诊率

Page 20: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境—流数据

流特征选择—加入新特征

Hao Huang, Shinjae Yoo, and S Kasiviswanathan. Unsupervised feature selection on data streams. In Proceedings

of the 24th ACM International on Conference on Information and Knowledge Management, pages 1031–1040. ACM,

2015.

Page 21: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

复杂数据环境—流数据

流特征选择—加入新样本

Jing Wang, Meng Wang, Peipei Li, Luoqi Liu, Zhongqiu Zhao, Xuegang Hu, and Xindong Wu. Online feature

selection with group structure analysis. IEEE Transactions on Knowledge and Data Engineering, 27(11):3029–3041,

2015.

新用户

Page 22: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

特征选择的挑战

Storage Burden Computation Complexity Generalization Ability

Page 23: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

研究进展-无监督

• 无监督特征选择的关键之一是如此生成伪的类标签,使无监督特征选择转化成有监督的问题;

• 数据的流形结构、样本相似性、样本分布、特征的自相似性等特性是构建嵌入式无监督特征选择算法的重要元素;

• 目前的无监督特征选择工作实验验证主要在已有的benchmark数据集上,没有涉及到超高维数据的特征选问题,

Page 24: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

Regularized self-representation (RSR)

研究进展-无监督

A feature can be represented

by a linear combination of

other features

For all the features

Pengfei Zhu, WangmengZuo, LeiZhang, QinghuaHu, SimonC.K.Shiu, Unsupervised feature selection by regularized self-

representation. Pattern Recognition 2015

Page 25: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

Regularized self-representation (RSR)

研究进展-无监督

Zhu P, Hu Q ,Zhang L, et al . A Discriminative Self-representation induced Classifier[C].//IJCAI.2016.

Page 26: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

研究进展-无监督

Coupled Dictionary Learning

解析字典合成字典

Predefined fastLearned local structure of images

解析合成字典对学习

解析字典合成字典

利用解析字典进行特征选择

Zhu P, Hu Q, Zhang C, et al. Coupled Dictionary Learning for Unsupervised Feature Selection[C]// AAAI. 2016.

Page 27: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

研究进展-无监督

Subspace clustering guided Unsupervised Feature Selection

Pengfei Zhu, Wencheng Zhu, Qinghua Hu, Changqing Zhang. Subspace Clustering guided Unsupervised Feature

Selection .

SCUFS

existing models

S F W

similarity matrix F W

X样本自表达可以更好地揭示样本和样本之间的关系

Page 28: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

多视角特征选择

研究进展-多视角

Lei Zhao, Qinghua Hu, Wenwu Wang, Heterogeneous Feature Selection with Multi-Modal Deep Neural Networks and

Sparse Group Lasso, TMM2015

Page 29: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

思考与讨论

• 复杂与开放环境下的数据建模---噪声缺失多源异构等;

• “旧瓶能否装新酒”—传统模型在复杂环境下如何泛化;

如:噪声和缺失环境下的特征选择

• Curse of dimensionality vs Blessing of dimensionality

Are Deep Networks a Solution

to Curse of Dimensionality?

Blessing of Dimensionality: High

Dimensional Feature and Its Efficient

Compression for Face Verification

Professor Stéphane MallatJian Sun

Page 30: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University

思考与讨论复杂数据环境下的深度学习低质量数据 low quality data

Z. Wang, S. Chang, Y. Yang, D. Liu and T. Huang, "Studying Very Low Resolution Recognition Using Deep

Networks", In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

feature enhancement and recognition simultaneously

Page 31: 复杂数据环境下的数据降维valser.org/webinar/slide/slides/20161026/ifuVales-webinar... · 2019. 1. 11. · Zhai, Ong, Tsang. The emerging “Big dimensionality”. IEEE

Tianjin University