bigdata and ai in p2 p industry: knowledge graph and inference

Post on 15-Apr-2017

499 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data and AI in P2P Industry

Wenzhe Li

nadalwz1115@gmail.com

Feb 1, 2016

Puhui Finance (www.puhuifinance.com)

Services

爱钱进

普惠信贷

创新资产

普惠财富

• Internet Financing P2P

company, headquarters

in Beijing

• Founded in July 2013

• $50M series A funding in

Dec 2014

• ~5500 employees, 100+

offline stores

Offline Financing

Service

Online Financing

Service

Online Lending

Service

Offline Lending

Service

Puhui Finance (cont.)

Fastest growing p2p

company. Big data

technology is the key

In this talk, I will mainly focus on the

techniques used in lending side risk control.

Similar techniques can be applied to the

financing side.

What the talk is about

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

• Credit system is not mature in China

• Targeting at under-served market, those who don’t have enough credit to borrow from bank

• The data solely from credit history is not enough to build the scoring models

• More efficient application reviewing process is needed as we move more transactions from offline to online

Why big data & AI

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

The central problem is

risk control

The solution is to

use big data

Measure the risk for a person

Individual

Feature

Analysis

Relation

Analysis

Knowledge GraphFeature Compute(FC)

Engine

• User explicitly input data (i.e. application form)

• Authorized* user data• Mobile History • Purchasing History• ……

• Open Search• Baidu.com• 360.com • Others (i.e. craigslist)

• 3rd- party data (i.e. blacklist)

Data

Unstructured Data

* User authorizes us to use their data

Feature Compute Engine

The goal is to convert unstructured

data to structured features

Feature Compute Engine

Credit Card

Mobile History

Purchasing

......

Precision Marketing

Fraud Score

Risk Score

Featu

re C

om

pu

te

En

gin

e

Feature Container

(tens of thousands)

Data

....

....

Data

Credit Card

History

Mobile

History

Purchasing

History

Feature Compute

EngineData

Scoring Model

Purchasing History

i.e. Purchasing History

Total amount spent during the last 6 months

User level (i.e. Prime, Normal…)

Total number of transactions during the last 6 months

The length of time he/she uses the account

Total number of transactions related to virtual products

Total number of transactions related to luxury products

………

Few thousand

features

• It is a semantic network

• Based on graph data structure, consists

of points and edges. Point represents

entity, edge represents relationship.

• Knowledge graph connects

heterogeneous information. It provides

the ability to analyze the data from the

perspective of relationship.

What is knowledge graph

Some knowledge graphs

Knowledge graph – search engine

Knowledge graph – search engine

Storing Knowledge graphRanking DBMS

21 Neo4j (Graph

Database)

32 MarkLogic (XML)

42 Titan (Graph Database)

46 OrientDB (Graph

Database)

61 Virtuoso (RDF)

80 Jena (RDF)

88 Sesmae (RDF)

90 ArangoDB

(GraphDatabase)

120 AllegroGraph (RDF)

Trends for different types of database [2] Graph/RDF database ranking [3]

• Logic-based approach

• Probabilistic approach (i.e. distributed representation)

• Hybrid approach

Key techniques for knowledge graph

Link Prediction

Simple Approach: Pre-define some rules

i.e. (Peter FatherOf Tom) -> (Tom SonOf Peter)

(Peter ColleagueOf Tom), (Sarah ColleagueOf Peter)

-> (Peter ColleaugeOf Sarah)

Logic-based approach

Methods based on distributed representation

• Translating Embedding [4]

• Tensor Factorization (RESCAL) Hybrid approach [5]

• Neural Tensor Network (NTN) [6]

Hybrid Approach – Logic + Probabilistic

Simple Approach:

1. Generating all the new links using pre-define rules

2. Apply Statistical Learning

Advanced Approach (i.e.):

• Incorporation of Rules into Embeddings [7]

• Injecting Logical Background [8]

Use Cases

Connects person, phone, address, email, company……

Domain-specific knowledge graph

10 types of entities

~50 types of relations

~50M entities

0.2B relations

We expect that it will become ~20 times bigger by the end of this year due to the business growth

Domain-specific knowledge graph

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

Applicant shares the

same personal phone

with other applicant

Phone

ApplicantOther

applicant

Personal Phone Personal Phone

Antifraud - rules

Applicant and other

applicant share the

same colleague phone,

but with different

company names

Phone

ApplicantOther

applicant

Colleague phone

Company 1 Company 2

Colleague phone

Antifraud – rules (cont.)

Phone

Applicant

Personal phone

Phone

Phone

Phone

Phone

Phone

Overdue

Overdue

Some of the

applicant’s contacts

didn’t pay back the

loan on time

Antifraud – rules (cont.)

Person 2

Person 1

Triangle relationship

Person 3

Antifraud – cycle detection

Applicant Applicant 2

Parent of Parent of

Applicant 1

Spouse

Inconsistent relations

Antifraud – inconsistent relationship

Antifraud – suspicious group

Person 2

Person 1

Person 3

Share a lot of

common attributes

Knowledge Graph

Visualization • Visualize entities and

relationships

• Design anti-fraud rules

via observational study

Antifraud – design by observation

Rapid change of

relationship structure

within short time period

Antifraud – evolution of graph structure

LR

Decision Tree

Random Forest

SVM

ANN

Models Prediction

Extracted

Features from

Raw Data

Results from

anti-fraud

rules

User direct

attributes

Variables

DNN

Score is used to

directly reject or

accept the loan

Antifraud – fraud score

score

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

The borrowers disappear, all the contact information they

explicitly provided become invalid. How to reach them?

Lost contact recovery – what is it

Implicitly infer potential contact information

Phone

Applicant

Personal phone

Phone

Phone

Phone

Phone

Phone

Rank the phone numbers,

and predict relationship

Building phone network – 1st order extension

Building phone network – 2nd order extension

Phone

Applicant

Personal phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Rank the phone

numbers, and

predict relationship

3rd order ..

Phone

Applicant

Personal phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Phone

Simple Ranking Criteria

• The total length of time

• The frequency of calls

Advanced Approach

• Learning the ranking score using machine learning approach

Building phone network – Rank

• Total # of times of calling

• Total length of time of

calling

• Total # of times of being

called

• Total # of times of calling

• Average time per call

• Maximum length of time

• # of times of calling

between 0-4am

• # of times of calling

between 4-8am

• ……

Building phone network – Predict the relation

LR

Decision Tree

Random Forest

SVM

ANN

ModelsPrediction of relation

~100 Features

DNN

Relation

With very limited

training data, our

model provides

~30% accuracy

Person

Applicant

Personal phone

Person

Other

applicant

knows?

Other approach – Link prediction (on-going work)

Link Prediction

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

Detect Bad People via Search

From the search results, we label each

entities in the knowledge graph i.e. black,

green etc.

• Baidu.com

• 360.com

• other public websites

Search for basic information….

• Phone number

• Email

• QQ

• Other IDs

Search Fields Search Engines & Public Site

Search for phone number…

Search for Email…

Fraud

• Clustering analysis

• Precision marketing

• ……

Other Applications we are working on

Outline

• Why need Big data and AI

• Intro to FC Engine and Knowledge Graph

• Case 1: Anti-Fraud

• Case 2: Lost Contact Recovery

• Case 3: Detect Bad People via Search

• More use cases

• Challenges

Challenges : Unstructured Data

Unstructured

Data

Images

Text

AudioVideo

Machine Learning

Natural Language

Processing

Data Mining

Challenges : Name Disambiguation

ApplicantOther

applicant

Puhui

Finance

Ltd.

Puhui

Finance

Same company, can

we merge?

It is a very important

problem to deal with!

Challenges : Reasoning

However, It is still an open problem

• Logic-based approach

• Probabilistic approach (i.e. distributed representation)

• Hybrid approach

Link Prediction

Challenges : Insufficient Samples

Big data, but small samples

• Senior/Lead Machine Learning/NLP Engineers

• Senior/Lead Data Engineer/Scientist

• Senior/Lead Architect

• Senior/Lead Software Engineer

liwenzhe@puhuifinance.com

zhaopin@puhuifinance.com

We are hiring! (in Beijing)

Open positions, but not limited to….

Contact

Company Website

www.puhuifinance.com

www.iqianjin.com

Email:nadalwz1115@hotmail.com

nadalwz1115@gmail.com

Wechat(微信):liwenzhe595675

Thanks!

[1] http://www.datapop.com/

[2] http://db-engines.com/en/blog_post//43

[3] http://db-engines.com/en/ranking

[4] Bordes, Antoine, et al. "Translating Embeddings for Modeling Multi-relational Data." Advances in Neural Information Processing Systems(2013):2787-2795.

[5] Nickel, Maximilian, V. Tresp, and H. P. Kriegel. "A Three-Way Model for Collective Learning on Multi-Relational Data.." International Conference on Machine Learning 2011:809-816.

References

[6] Richard Socher, Danqi Chen, Christopher D. Manning, Andrew Ng. Reasoning With Neural Tensor Networks for Knowledge Base Completion. Advances in Neural Information Processing Systems(2013)

[7] Wang, Quan, Wang, Bin, and Guo, Li. "Knowledge base completion using embeddings and rules." Proceedings of the 24th International Conference on Artificial Intelligence AAAI Press, 2015.

[8] T Rocktäschel,S Singh,S Riedel. Injecting Logical Background Knowledge into Embeddings for Relation Extraction http://talks.cam.ac.uk/talk/index/58360

References

top related