bigdata and ai in p2 p industry: knowledge graph and inference

59
Big Data and AI in P2P Industry Wenzhe Li [email protected] Feb 1, 2016

Upload: sfbiganalytics

Post on 15-Apr-2017

499 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Big Data and AI in P2P Industry

Wenzhe Li

[email protected]

Feb 1, 2016

Page 2: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Puhui Finance (www.puhuifinance.com)

Services

爱钱进

普惠信贷

创新资产

普惠财富

• Internet Financing P2P

company, headquarters

in Beijing

• Founded in July 2013

• $50M series A funding in

Dec 2014

• ~5500 employees, 100+

offline stores

Offline Financing

Service

Online Financing

Service

Online Lending

Service

Offline Lending

Service

Page 3: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Puhui Finance (cont.)

Fastest growing p2p

company. Big data

technology is the key

Page 4: Bigdata and ai in p2 p industry:  Knowledge graph and inference

In this talk, I will mainly focus on the

techniques used in lending side risk control.

Similar techniques can be applied to the

financing side.

What the talk is about

Page 5: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

Page 6: Bigdata and ai in p2 p industry:  Knowledge graph and inference

• Credit system is not mature in China

• Targeting at under-served market, those who don’t have enough credit to borrow from bank

• The data solely from credit history is not enough to build the scoring models

• More efficient application reviewing process is needed as we move more transactions from offline to online

Why big data & AI

Page 7: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

Page 8: Bigdata and ai in p2 p industry:  Knowledge graph and inference

The central problem is

risk control

The solution is to

use big data

Page 9: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Measure the risk for a person

Individual

Feature

Analysis

Relation

Analysis

Knowledge GraphFeature Compute(FC)

Engine

Page 10: Bigdata and ai in p2 p industry:  Knowledge graph and inference

• User explicitly input data (i.e. application form)

• Authorized* user data• Mobile History • Purchasing History• ……

• Open Search• Baidu.com• 360.com • Others (i.e. craigslist)

• 3rd- party data (i.e. blacklist)

Data

Unstructured Data

* User authorizes us to use their data

Page 11: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Feature Compute Engine

The goal is to convert unstructured

data to structured features

Page 12: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Feature Compute Engine

Credit Card

Mobile History

Purchasing

......

Precision Marketing

Fraud Score

Risk Score

Featu

re C

om

pu

te

En

gin

e

Feature Container

(tens of thousands)

Data

....

....

Data

Credit Card

History

Mobile

History

Purchasing

History

Feature Compute

EngineData

Scoring Model

Page 13: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Purchasing History

i.e. Purchasing History

Total amount spent during the last 6 months

User level (i.e. Prime, Normal…)

Total number of transactions during the last 6 months

The length of time he/she uses the account

Total number of transactions related to virtual products

Total number of transactions related to luxury products

………

Few thousand

features

Page 14: Bigdata and ai in p2 p industry:  Knowledge graph and inference

• It is a semantic network

• Based on graph data structure, consists

of points and edges. Point represents

entity, edge represents relationship.

• Knowledge graph connects

heterogeneous information. It provides

the ability to analyze the data from the

perspective of relationship.

What is knowledge graph

Page 15: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Some knowledge graphs

Page 16: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Knowledge graph – search engine

Page 17: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Knowledge graph – search engine

Page 19: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Storing Knowledge graphRanking DBMS

21 Neo4j (Graph

Database)

32 MarkLogic (XML)

42 Titan (Graph Database)

46 OrientDB (Graph

Database)

61 Virtuoso (RDF)

80 Jena (RDF)

88 Sesmae (RDF)

90 ArangoDB

(GraphDatabase)

120 AllegroGraph (RDF)

Trends for different types of database [2] Graph/RDF database ranking [3]

Page 20: Bigdata and ai in p2 p industry:  Knowledge graph and inference

• Logic-based approach

• Probabilistic approach (i.e. distributed representation)

• Hybrid approach

Key techniques for knowledge graph

Link Prediction

Page 21: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Simple Approach: Pre-define some rules

i.e. (Peter FatherOf Tom) -> (Tom SonOf Peter)

(Peter ColleagueOf Tom), (Sarah ColleagueOf Peter)

-> (Peter ColleaugeOf Sarah)

Logic-based approach

Page 22: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Methods based on distributed representation

• Translating Embedding [4]

• Tensor Factorization (RESCAL) Hybrid approach [5]

• Neural Tensor Network (NTN) [6]

Page 23: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Hybrid Approach – Logic + Probabilistic

Simple Approach:

1. Generating all the new links using pre-define rules

2. Apply Statistical Learning

Advanced Approach (i.e.):

• Incorporation of Rules into Embeddings [7]

• Injecting Logical Background [8]

Page 24: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Use Cases

Page 25: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Connects person, phone, address, email, company……

Domain-specific knowledge graph

Page 26: Bigdata and ai in p2 p industry:  Knowledge graph and inference

10 types of entities

~50 types of relations

~50M entities

0.2B relations

We expect that it will become ~20 times bigger by the end of this year due to the business growth

Domain-specific knowledge graph

Page 27: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

Page 28: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Applicant shares the

same personal phone

with other applicant

Phone

ApplicantOther

applicant

Personal Phone Personal Phone

Antifraud - rules

Page 29: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Applicant and other

applicant share the

same colleague phone,

but with different

company names

Phone

ApplicantOther

applicant

Colleague phone

Company 1 Company 2

Colleague phone

Antifraud – rules (cont.)

Page 30: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Phone

Applicant

Personal phone

Phone

Phone

Phone

Phone

Phone

Overdue

Overdue

Some of the

applicant’s contacts

didn’t pay back the

loan on time

Antifraud – rules (cont.)

Page 31: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Person 2

Person 1

Triangle relationship

Person 3

Antifraud – cycle detection

Page 32: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Applicant Applicant 2

Parent of Parent of

Applicant 1

Spouse

Inconsistent relations

Antifraud – inconsistent relationship

Page 33: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Antifraud – suspicious group

Person 2

Person 1

Person 3

Share a lot of

common attributes

Page 34: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Knowledge Graph

Visualization • Visualize entities and

relationships

• Design anti-fraud rules

via observational study

Antifraud – design by observation

Page 35: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Rapid change of

relationship structure

within short time period

Antifraud – evolution of graph structure

Page 36: Bigdata and ai in p2 p industry:  Knowledge graph and inference

LR

Decision Tree

Random Forest

SVM

ANN

Models Prediction

Extracted

Features from

Raw Data

Results from

anti-fraud

rules

User direct

attributes

Variables

DNN

Score is used to

directly reject or

accept the loan

Antifraud – fraud score

score

Page 37: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

Page 38: Bigdata and ai in p2 p industry:  Knowledge graph and inference

The borrowers disappear, all the contact information they

explicitly provided become invalid. How to reach them?

Lost contact recovery – what is it

Implicitly infer potential contact information

Page 39: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Phone

Applicant

Personal phone

Phone

Phone

Phone

Phone

Phone

Rank the phone numbers,

and predict relationship

Building phone network – 1st order extension

Page 40: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Building phone network – 2nd order extension

Phone

Applicant

Personal phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Rank the phone

numbers, and

predict relationship

Page 41: Bigdata and ai in p2 p industry:  Knowledge graph and inference

3rd order ..

Phone

Applicant

Personal phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Page 42: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Simple Ranking Criteria

• The total length of time

• The frequency of calls

Advanced Approach

• Learning the ranking score using machine learning approach

Building phone network – Rank

Page 43: Bigdata and ai in p2 p industry:  Knowledge graph and inference

• Total # of times of calling

• Total length of time of

calling

• Total # of times of being

called

• Total # of times of calling

• Average time per call

• Maximum length of time

• # of times of calling

between 0-4am

• # of times of calling

between 4-8am

• ……

Building phone network – Predict the relation

LR

Decision Tree

Random Forest

SVM

ANN

ModelsPrediction of relation

~100 Features

DNN

Relation

With very limited

training data, our

model provides

~30% accuracy

Page 44: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Person

Applicant

Personal phone

Person

Other

applicant

knows?

Other approach – Link prediction (on-going work)

Link Prediction

Page 45: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

Page 46: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Detect Bad People via Search

From the search results, we label each

entities in the knowledge graph i.e. black,

green etc.

Page 47: Bigdata and ai in p2 p industry:  Knowledge graph and inference

• Baidu.com

• 360.com

• other public websites

Search for basic information….

• Phone number

• Email

• QQ

• Other IDs

Search Fields Search Engines & Public Site

Page 48: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Search for phone number…

Page 49: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Search for Email…

Fraud

Page 50: Bigdata and ai in p2 p industry:  Knowledge graph and inference

• Clustering analysis

• Precision marketing

• ……

Other Applications we are working on

Page 51: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

Page 52: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Challenges : Unstructured Data

Unstructured

Data

Images

Text

AudioVideo

Machine Learning

Natural Language

Processing

Data Mining

Page 53: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Challenges : Name Disambiguation

ApplicantOther

applicant

Puhui

Finance

Ltd.

Puhui

Finance

Same company, can

we merge?

It is a very important

problem to deal with!

Page 54: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Challenges : Reasoning

However, It is still an open problem

• Logic-based approach

• Probabilistic approach (i.e. distributed representation)

• Hybrid approach

Link Prediction

Page 55: Bigdata and ai in p2 p industry:  Knowledge graph and inference

Challenges : Insufficient Samples

Big data, but small samples

Page 56: Bigdata and ai in p2 p industry:  Knowledge graph and inference

• Senior/Lead Machine Learning/NLP Engineers

• Senior/Lead Data Engineer/Scientist

• Senior/Lead Architect

• Senior/Lead Software Engineer

[email protected]

[email protected]

We are hiring! (in Beijing)

Open positions, but not limited to….

Contact

Company Website

www.puhuifinance.com

www.iqianjin.com

Page 58: Bigdata and ai in p2 p industry:  Knowledge graph and inference

[1] http://www.datapop.com/

[2] http://db-engines.com/en/blog_post//43

[3] http://db-engines.com/en/ranking

[4] Bordes, Antoine, et al. "Translating Embeddings for Modeling Multi-relational Data." Advances in Neural Information Processing Systems(2013):2787-2795.

[5] Nickel, Maximilian, V. Tresp, and H. P. Kriegel. "A Three-Way Model for Collective Learning on Multi-Relational Data.." International Conference on Machine Learning 2011:809-816.

References

Page 59: Bigdata and ai in p2 p industry:  Knowledge graph and inference

[6] Richard Socher, Danqi Chen, Christopher D. Manning, Andrew Ng. Reasoning With Neural Tensor Networks for Knowledge Base Completion. Advances in Neural Information Processing Systems(2013)

[7] Wang, Quan, Wang, Bin, and Guo, Li. "Knowledge base completion using embeddings and rules." Proceedings of the 24th International Conference on Artificial Intelligence AAAI Press, 2015.

[8] T Rocktäschel,S Singh,S Riedel. Injecting Logical Background Knowledge into Embeddings for Relation Extraction http://talks.cam.ac.uk/talk/index/58360

References