read and write files - seoul national universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. ·...

Python for machine learning & deep learning

References 1. 2017년 한국통계학회 여름학교, <기계학습 및 딥러닝 과정> 2. Cs231n, <Python/numpy tutorial>, http://cs231n.github.io/python-

numpy-tutorial/

http://cs231n.github.io/python-numpy-tutorial/





Install Python, Anaconda, Jupyter notebook

Data types (integer, real, list, dictionary)

Basic statements

Functions, modules & packages, classes

Read and write files

Regression, logistic regression, LDA, SVM

Since 1991, created by Guido van Rossum.

Open source.

C support (easily extendable)

high-level programming language

(very powerful ideas in very few lines)

Object-oriented programming (classes)

Codes can be grouped in modules and packages.

Parallel computing

Go to https://www.python.org/

https://www.python.org/

Data Science Platform powered by Python

Over 720 useful packages!

(numpy, scikit-learn, matplotlib, tensorflow, pytorch, …)

Go to https://www.continuum.io/downloads

https://www.continuum.io/downloads

https://www.continuum.io/downloads

Create a notebook file(.ipynb)

Code mode

Markdown mode

8/24/2018 Datatypes

file:///C:/Users/user/Downloads/Datatypes.html 1/8

Data types1. Numeric types: integer, float, boolean2. Sequence types: string, tuple, range, list3. Set type: set4. Mapping type: dictionary

(These are all objects.)

Like Java and R, Python doesn't require to explicitly declare variable types.

Assignment is done using: (variable name) = (value).

Variable names can be composed of a~z, A~Z, 0~9 and _, but should not start with numbers!

1. Numeric types

Integer and floats

In [1]:

x = 3 print( x+1 ) print( x*2 ) print( x**2 )

In [2]:

x = x+1 print(x) x += 1 print(x) x *= 2 print(x)

4 6 9

4 5 10

8/24/2018 Datatypes


In [3]:

print(2/3); print(2./3.) # ※In Python 3, division always gives a real number. print(int(2./3.)) print(3//2) print(3%2)

Booleans

In [4]:

x = (2==3) print(x)

y = (3//2!=1) print(y)

print(x and y) print(x or y) print(not x)

In [5]:

6<9, 'a'<'A', 'a'>'A', 'abcd'>'adb'

In [6]:

(1,2,3)<(2,1,3)

0 0.666666666667 0 1 1

False False False False True

Out[5]:

(True, False, True, False)

Out[6]:

True

8/24/2018 Datatypes


2. Sequences: string, tuple, range, list1. indexing: s[i] selects i-th item of sequence s2. slicing: s[i:j] selects from i-th to (j-1)-th items of s3. immutable vs mutable sequences

immutable sequence: string, tuple, range / mutable sequence: list

4. operations: +, - , len(), in, not in

Strings

In [7]:

s='Deep learning'

print(s[0], s[1], s[-1])

## Note that indexing starts from 0, unlike in R!

print(s[2:5], s[:5])

In [8]:

s[5]='L' # This does not work.

In [9]:

s=s+' is great!' print(s) print(s*2)

In [10]:

len(s)

('D', 'e', 'g') ('ep ', 'Deep ')

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-8-550143e29e73> in <module>() ----> 1 s[5]='L' # This does not work. TypeError: 'str' object does not support item assignment

Deep learning is great! Deep learning is great!Deep learning is great!

Out[10]:

23

8/24/2018 Datatypes


In [11]:

'Deep' in s

In [12]:

s.count('e')

In [13]:

s.split()

In [14]:

s2 = 'March %dth' % (12) print(s2)

s3= 'Today is %s %dth' % ('March',12) print(s3)

Tuples

In [15]:

#Items can be of arbitrary type t=(1,2,'abc') print(t)

#Tuples can be nested t2=(t,3,'de') print(t2)

Out[11]:

True

Out[12]:

4

Out[13]:

['Deep', 'learning', 'is', 'great!']

March 12th Today is March 12th

(1, 2, 'abc') ((1, 2, 'abc'), 3, 'de')

8/24/2018 Datatypes


In [16]:

print(t[0],t[-1], t[1:2])

print(t+(3,'de')) print(t*4)

In [17]:

t[0]=5 # This does not work.

In [19]:

print(len(t)) print(2 in t)

In [20]:

x,y,z=t print(x) print(y) print(z)

Ranges

(1, 'abc', (2,)) (1, 2, 'abc', 3, 'de') (1, 2, 'abc', 1, 2, 'abc', 1, 2, 'abc', 1, 2, 'abc')

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-17-e34127f16d5a> in <module>() ----> 1 t[0]=5 # This does not work. TypeError: 'tuple' object does not support item assignment

3 True

1 2 abc

8/24/2018 Datatypes


In [21]:

print(range(10))

print(range(1,10))

print(range(1,10,2))

print(range(0,-10,-2))

Lists

In [22]:

#Lists are similar to tuples, except that they are mutable ! #Also, lists cannot be used as elements of sets while tuples can.

l=[3,7,5,'a']

l[0], l[-1], l[1:2]

In [23]:

print(len(l))

print(l+[8,'b'])

#Be careful !! [1,2,3]+[1,1,1] is not [2,3,4], but [1,2,3,1,1,1]

In [24]:

'A' in l

In [25]:

l[0]=4 # This works. 'list' object supports item assignment.

Lists are objects!!

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 3, 5, 7, 9] [0, -2, -4, -6, -8]

Out[22]:

(3, 'a', [7])

4 [3, 7, 5, 'a', 8, 'b']

Out[24]:

False

8/24/2018 Datatypes


In [26]:

l.append(10); print(l)

del l[3]; print(l)

l.reverse(); print(l)

l2=[3,4,2,5]

l2.sort(); print(l2)

l2.pop(0); print(l2)

l2.remove(5);print(l2)

3. Sets

In [27]:

a={2,2,3,4,5,'a'} print(a)

a=[2,2,3,4,5,5] a=set(a) print(a) a.add(6) print(a) a.remove(5) print(a)

We can use subtractions for sets!

In [28]:

numbers=set(range(10)) evens={0,2,4,6,8} odds=numbers-evens print(odds)

Sets also allow union(|) and intersections(&)!

[4, 7, 5, 'a', 10] [4, 7, 5, 10] [10, 5, 7, 4] [2, 3, 4, 5] [3, 4, 5] [3, 4]

set(['a', 2, 3, 4, 5]) set([2, 3, 4, 5]) set([2, 3, 4, 5, 6]) set([2, 3, 4, 6])

set([1, 3, 9, 5, 7])

8/24/2018 Datatypes


In [29]:

f={2,4,6,8} g={4,8,12} print(f|g) print(f&g)

4. Dictionaries

In [30]:

# finite set of items indexed by keys

d={'beer':14,'wine':10,'whiskey':9,'brandy':'not left'}

d['beer']

In [31]:

d['gin']=10 print(d)

In [32]:

print(d.items()) print(d.keys()) print(d.values())

In [33]:

d={'beer':14,'wine':10,'whiskey':9,'brandy':'not left'} i,k=list(d.items())[1] print(i,k) print('We have only %dL of %s left' % (k, i))

set([2, 4, 6, 8, 12]) set([8, 4])

Out[30]:

14

{'whiskey': 9, 'beer': 14, 'gin': 10, 'brandy': 'not left', 'wine': 10}

[('whiskey', 9), ('beer', 14), ('gin', 10), ('brandy', 'not left'), ('wine', 10)] ['whiskey', 'beer', 'gin', 'brandy', 'wine'] [9, 14, 10, 'not left', 10]

('beer', 14) We have only 14L of beer left

8/24/2018 Basic_statements_and_functions

file:///C:/Users/user/Downloads/Basic_statements_and_functions.html 1/9

Traditional Control Flow1. Conditional execution: if, else, elif2. Repeated execution: for, while

***Python uses indents instead of parentheses to represent nested structures in the code.

1. Conditional execution (if, else, elif)

In [1]:

x=7

if x%2==0: print('x is even.')

else: print('x is not even.')

Nested structure:

In [2]:


else: if x%3==0: print('x is divisible by 3.') else: print('x is not even and not divisible by 3.')

Use of elif:

In [3]:


elif x%3==0: print('x is divisible by 3.')

else: print('x is not even and not divisible by 3.')

2. Repeated execution (for, while)

x is not even.

x is not even and not divisible by 3.

x is not even and not divisible by 3.



In [4]:

x=0 for i in [1,2,3]: x+=i print(x)

In [5]:

x=0 for i in range(1,10): x+=i

print(x)

In [6]:

for l in 'python is magic': print (l)

In [7]:

D={'Bob':10, 'Steven':9, 'Anna':80}

for k,v in D.items(): print('%s is %d years old'%(k,v))

We can use for and if together:

1 3 6

45

p y t h o n i s m a g i c

Bob is 10 years old Anna is 80 years old Steven is 9 years old



In [8]:

#Fibonacci Sequence

x=[] for i in range(1,10): if len(x)==0: x=x+[1,2] else: x.append(x[-1]+x[-2])

print(x)

break statement:

In [9]:

x=[] for i in range(1,100): if len(x)==0: x=x+[1,2] else: x.append(x[-1]+x[-2]) if x[-1]>1000: break

print(x)

while statement

In [10]:

x=1 while x<100: x*=2

print(x)

[1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

[1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597]

128



In [11]:

#Prime numbers

x=[2] n=2

while len(x)<100: prime=True n+=1 for i in x: if n%i==0: prime=False; break if prime: x.append(n)

print(x)

※Using for statement to construct lists, tuples, sets,dictionaries

In [12]:

nums = [0, 1, 2, 3, 4] squares = [] for x in nums: squares.append(x ** 2)

print(squares)

In [13]:

nums = [0, 1, 2, 3, 4] squares = [x ** 2 for x in nums] print(squares)

In [14]:

nums = [0, 1, 2, 3, 4] even_squares = [x ** 2 for x in nums if x % 2 == 0] print(even_squares)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541]

[0, 1, 4, 9, 16]

[0, 1, 4, 9, 16]

[0, 4, 16]



In [15]:

words='Give me a full tank of gas.'.split() short_words=[w for w in words if len(w)<3] print(short_words)

In [16]:

D={'Alice':(1,'woman'),'Bob':(1,'man'),'Anna':(2,'woman'),'Frank':(3,'man'),'Jane':(1,'woman')}

womens={k:v for k,v in D.items() if D[k][1]=='woman'} print(womens)

In [17]:

First_graders={k for k,v in D.items() if v[0]==1} print(First_graders)

In [18]:

First_graders={k for k,v in D.items() if 1 in v} print(First_graders)

FunctionsFunctions are defined using the def keyword.

Functions are also objects !

In [19]:

def sign(x): if x>0: return 'positive' elif x<0: return 'negative' else: return 'zero'

In [20]:

print(sign(1), sign(-1), sign(0))

['me', 'a', 'of']

{'Jane': (1, 'woman'), 'Alice': (1, 'woman'), 'Anna': (2, 'woman')}

set(['Jane', 'Bob', 'Alice'])

set(['Jane', 'Bob', 'Alice'])

('positive', 'negative', 'zero')



In [21]:

def comb(n,r): nom=1 for i in range(n,n-r,-1): nom*=i denom=1 for j in range(r,0,-1): denom*=j return nom/denom

In [22]:

comb(10,2)

In [23]:

comb(r=2,n=10)

About *args and *kwargs.

In [24]:

def comb(*args): nom=1 n=args[0]; r=args[1] for i in range(n,n-r,-1): nom*=i denom=1 for j in range(r,0,-1): denom*=j return nom/denom

In [25]:

comb(10,2)

Out[22]:

45

Out[23]:

45

Out[25]:

45



In [26]:

def comb(**kwargs): nom=1 n=kwargs.pop('n'); r=kwargs.pop('r') for i in range(n,n-r,-1): nom*=i denom=1 for j in range(r,0,-1): denom*=j return nom/denom

In [27]:

comb(n=10,r=2)

In [28]:

def some_func(arg, *args, **kwargs): print "First, there is one argument: ", arg print "Then, there are multiple non-keyworded arguments: ", args print "Finally, there are multiple keyworded arguments", kwargs print kwargs.pop('a')

In [29]:

some_func("fruit","apple","banana",a="orange",b="strawberry")

Lambda functions

In [30]:

g=lambda x, y:x+y; g(3,2)

In [31]:

def multiply (n): return lambda x: x*n mult2=multiply(2) mult6=multiply(6) print (mult2(10), mult6(10))

Out[27]:

45

First, there is one argument: fruit Then, there are multiple non-keyworded arguments: ('apple', 'banana') Finally, there are multiple keyworded arguments {'a': 'orange', 'b': 'strawberry'} orange

Out[30]:

5

(20, 60)



Map function

usage: map(function,seq)

In [32]:

x=[1,2,3,4,5] y=map(lambda a:a**2,x) print(list(y))

In [33]:

def comb(n,r): nom=1 for i in range(n,n-r,-1): nom*=i denom=1 for j in range(r,0,-1): denom*=j return nom/denom

In [34]:

x=[10,9,8] y=[2,2,2] z=map(comb,x,y) print(list(z))

Filter function

usage: filter(function,seq)

In [35]:

L=range(10) Even_numbers=filter(lambda x:x%2==0, L) print(Even_numbers)

Zip function

usage: zip(seq,seq,....)

In [36]:

data=zip([1,2,3],['a','b','c']) print(data)

[1, 4, 9, 16, 25]

[45, 36, 28]

[0, 2, 4, 6, 8]

[(1, 'a'), (2, 'b'), (3, 'c')]



In [37]:

data=zip([1,2,3],[10,20,30],[100,200,300]) print(data)

In [38]:

data=zip([1,2,3,4],[10,20,30]) print(data)

[(1, 10, 100), (2, 20, 200), (3, 30, 300)]

[(1, 10), (2, 20), (3, 30)]

8/24/2018 Modules and Packages

file:///C:/Users/user/Downloads/Modules+and+Packages.html 1/9

Modules and PackagesModules are python files(.py) or C files(.c) that define variables or functions.

We can use functions in these files using import.

In [1]:

import random print (random.choice(range(10))) myL = ['a','b','c','d'] print(random.choice(myL))

In [2]:

print(random.sample([1,3,4,7,8,9,20],3))

In [3]:

print(random.uniform(-1,1))

In [4]:

print(random.gauss(0,1))

In [5]:

import random as rand

print(rand.gauss(0,1))

In [6]:

import numpy as np from random import shuffle x = np.arange(10) print (x) shuffle(x) print (x)

6 c

[8, 4, 1]

0.592482310435

-0.883743686456

-0.959601735309

[0 1 2 3 4 5 6 7 8 9] [1 6 4 8 7 9 5 2 3 0]



In [7]:

from random import shuffle as shf shf(x) print(x)

In [8]:

import math print(math.pi)

In [9]:

from math import * print(pi)

Packages are groups of modules. (folders containing modules)

Importation is done similarly. (Usually, we don't need to distinguish between packages and modules.)

There are some popular packages: Numpy, matplotlib, Pandas, sklearn

1. Numpy-provides support for large, multi-dimensional arrays and matrices, and tools for working with these arrays.

In [10]:

import numpy as np

a = [1,2,3] b = np.array([1,2,3])

print(a) print(b)

In [11]:

print([1,2,3]+[4,5,6]) print(np.array([1,2,3])+np.array([4,5,6]))

[2 5 9 0 8 3 1 4 7 6]

3.14159265359

3.14159265359

[1, 2, 3] [1 2 3]

[1, 2, 3, 4, 5, 6] [5 7 9]



In [12]:

print(type(b)) print(b.shape)

In [13]:

mat=np.array([[2,5,18,14,4], [12,15,1,2,8]]) print(mat[1,2]) print(mat[1][2]) print(mat.shape)

※ Numpy arrays are not restricted to 2-dimensional arrays. We can construct n-dimensional arrays for any n:

In [14]:

mat2=np.array([[[2,5,18,14,4], [12,15,1,2,8]],[[3,5,8,12,7], [11,1,0,20,8]]]) print(mat2[0,1,2]) print(mat2[0][1][2]) print(mat2.shape)

In [15]:

print(mat[1,2:4])

In [16]:

print(mat[:,2:4])

In [17]:

x = np.random.rand(5,5) print(x)

<type 'numpy.ndarray'> (3L,)

1 1 (2L, 5L)

1 1 (2L, 2L, 5L)

[1 2]

[[18 14] [ 1 2]]

[[ 0.33602657 0.08713678 0.3112669 0.99729453 0.20404397] [ 0.1867075 0.65289183 0.10794725 0.18783397 0.61317855] [ 0.34434668 0.84085011 0.48526731 0.12453967 0.73810552] [ 0.99193341 0.49008858 0.99953952 0.8560266 0.79620392] [ 0.43853113 0.73186459 0.59863405 0.78534825 0.39061338]]



In [18]:

x= np.random.rand(5,5,3) print(x)

In [19]:

x = np.random.randint(10,size=(2,3)) print(x) print(x.T)

In [20]:

x = np.zeros((4,4)) print(x)

[[[ 0.17519698 0.57862758 0.39816557] [ 0.5353808 0.71617415 0.6992798 ] [ 0.09908896 0.72867946 0.99160388] [ 0.87185827 0.82128772 0.15102032] [ 0.29208183 0.93137749 0.10133339]] [[ 0.97454328 0.44561974 0.23556668] [ 0.20297059 0.33170493 0.29382776] [ 0.6938783 0.72384771 0.6870527 ] [ 0.92138283 0.34110331 0.00642824] [ 0.53595384 0.71680027 0.82789677]] [[ 0.33550908 0.50187063 0.50336394] [ 0.72172305 0.01282984 0.59020119] [ 0.58070422 0.43158476 0.27933153] [ 0.74492476 0.99898476 0.4756136 ] [ 0.68603434 0.85872936 0.00503372]] [[ 0.60770832 0.54820912 0.70841008] [ 0.92211787 0.96545553 0.05764555] [ 0.39403186 0.40494627 0.1096706 ] [ 0.60754665 0.36105868 0.6888219 ] [ 0.97182466 0.26650012 0.89539913]] [[ 0.62632298 0.93303173 0.28578147] [ 0.60996479 0.29316039 0.56087895] [ 0.35029291 0.4967372 0.09356206] [ 0.03701591 0.96378074 0.55941634] [ 0.09588203 0.72256591 0.16571038]]]

[[5 1 2] [6 7 6]] [[5 6] [1 7] [2 6]]

[[ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.]]



In [21]:

x = np.ones((4,4)) print(x)

In [22]:

x = np.eye(4) print(x)

In [23]:

x = np.random.rand(5,3) print(x) print('Mean:', np.mean(x)) print('Column means: ', np.mean(x,0)) print('Row means: ', np.mean(x,1))

In [24]:

print(np.std(x)) print(np.std(x,1)) print(np.median(x)) print(np.sum(x,1))

[[ 1. 1. 1. 1.] [ 1. 1. 1. 1.] [ 1. 1. 1. 1.] [ 1. 1. 1. 1.]]

[[ 1. 0. 0. 0.] [ 0. 1. 0. 0.] [ 0. 0. 1. 0.] [ 0. 0. 0. 1.]]

[[ 0.70862641 0.49457898 0.8370175 ] [ 0.2759447 0.72783086 0.16280874] [ 0.17098052 0.62116806 0.77728114] [ 0.02145235 0.46639665 0.24476309] [ 0.32827653 0.35513869 0.81404162]] ('Mean:', 0.46708705629868918) ('Column means: ', array([ 0.3010561 , 0.53302265, 0.56718242])) ('Row means: ', array([ 0.6800743 , 0.38886143, 0.52314324, 0.24420403, 0.49915228]))

0.258313892683 [ 0.14125025 0.24409717 0.25704313 0.18164818 0.22293028] 0.466396649999 [ 2.0402229 1.16658429 1.56942972 0.73261209 1.49745684]



In [25]:

#Be carefull!!!

Mat=np.array([[1,2,3],[4,5,6]]) B=Mat[:2,:2] print(B) B[0,0]=0 print(B) print(Mat)

To avoid the above problem, use numpy.copy() function:

In [26]:

Mat=np.array([[1,2,3],[4,5,6]]) B=np.copy(Mat[:2,:2]) print(B) B[0,0]=0 print(B) print(Mat)

[[1 2] [4 5]] [[0 2] [4 5]] [[0 2 3] [4 5 6]]

[[1 2] [4 5]] [[0 2] [4 5]] [[1 2 3] [4 5 6]]



In [27]:

x = np.random.rand(4,3) print(x)

x[1,2] = -5 print(x) x[0:2,:] += 1 print(x) x[2:4,1:3] = 0.5 print(x)

x[x>0.5] = 0 print(x)

In [28]:

print(2*x+1)

In [29]:

#inner products and outer products

y = np.array([2,-1,3]) z = np.array([-1,2,2]) print(np.dot(y,z)) print(np.outer(y,z))

[[ 0.24443871 0.13558183 0.32119583] [ 0.89104852 0.00282243 0.04900889] [ 0.80992828 0.272393 0.34706002] [ 0.09410757 0.76380115 0.40320505]] [[ 2.44438709e-01 1.35581835e-01 3.21195827e-01] [ 8.91048516e-01 2.82243423e-03 -5.00000000e+00] [ 8.09928280e-01 2.72393000e-01 3.47060015e-01] [ 9.41075701e-02 7.63801149e-01 4.03205050e-01]] [[ 1.24443871 1.13558183 1.32119583] [ 1.89104852 1.00282243 -4. ] [ 0.80992828 0.272393 0.34706002] [ 0.09410757 0.76380115 0.40320505]] [[ 1.24443871 1.13558183 1.32119583] [ 1.89104852 1.00282243 -4. ] [ 0.80992828 0.5 0.5 ] [ 0.09410757 0.5 0.5 ]] [[ 0. 0. 0. ] [ 0. 0. -4. ] [ 0. 0.5 0.5 ] [ 0.09410757 0.5 0.5 ]]

[[ 1. 1. 1. ] [ 1. 1. -7. ] [ 1. 2. 2. ] [ 1.18821514 2. 2. ]]

2 [[-2 4 4] [ 1 -2 -2] [-3 6 6]]



In [30]:

#matrix multiplication

y = np.array([1,0,0]) print(x.dot(y))

y = np.array([1,0,1,0]) print(y.dot(x))

y = np.random.rand(3,2) z = x.dot(y) print (z)

In [31]:

# matrix inverse

x=np.random.rand(4,4) y=np.linalg.inv(x) print(x) print(y) print(x.dot(y))

2. Matplotlib-provides a plotting system. Matplotlib.pyplot module is the mostly used module.

[ 0. 0. 0. 0.09410757] [ 0. 0.5 0.5] [[ 0. 0. ] [-2.81630857 -1.43198197] [ 0.73427449 0.59637278] [ 0.82198746 0.60246007]]

[[ 0.78794593 0.62330993 0.04121658 0.15845135] [ 0.0643323 0.5537933 0.49065302 0.74759651] [ 0.71847476 0.64560113 0.66669248 0.85921365] [ 0.26377156 0.46578631 0.59758112 0.0417924 ]] [[ 0.17916293 -1.67106794 1.43254769 -0.23852615] [ 1.54590582 2.0290191 -2.07701599 0.54464513] [-1.26074494 -0.88293244 0.93033221 1.44738769] [-0.33313416 0.55786768 0.80472304 -1.33285939]] [[ 1.00000000e+00 -2.77555756e-17 -8.32667268e-17 0.00000000e+00] [ -5.55111512e-17 1.00000000e+00 -1.11022302e-16 0.00000000e+00] [ -5.55111512e-17 -2.77555756e-16 1.00000000e+00 -2.22044605e-16] [ -7.80625564e-17 6.24500451e-17 1.38777878e-16 1.00000000e+00]]



In [32]:

import matplotlib.pyplot as plt

x=np.arange(0,4*np.pi,0.1) y=np.sin(x) z=np.cos(x)

plt.plot(x, y) plt.plot(x, z) plt.xlabel('x') plt.ylabel('y') plt.title('Sine and Cosine functions') plt.legend(['Sine','Cosine']) plt.show()

8/24/2018 Classes

file:///C:/Users/user/Downloads/Classes.html 1/3

Classes-Classes are a group of variables and functions.

-Some modules define functions inside classes.

These can be imported similarly as modules and packages, via import.

In [1]:

class MyClass: def set(self, v): self.value = v def put(self): print(self.value)

c = MyClass()

In [2]:

c.set('orange') c.put()

In [3]:

MyClass.set(c, 'orange') MyClass.put(c)

In [4]:

class Person:

def __init__(self, first, last): self.firstname = first self.lastname = last

def Name(self): return self.firstname + " " + self.lastname

x = Person("Marge", "Simpson") print(x.Name())

Classes can be inherited !!

orange

orange

Marge Simpson

8/24/2018 Classes


In [5]:

class Employee(Person):

def __init__(self, first, last, staffnum): Person.__init__(self,first,last) self.staffnumber = staffnum

def GetEmployee(self): return self.Name() + ", " + self.staffnumber

y = Employee("Homer", "Simpson", "1007")

print(y.GetEmployee())

We can use super().init instead of Person.init.

In this case, we must remove self :

Why do we use classes?

In [6]:

import numpy as np

class Dataset(): def __init__(self,X,Y): self.dataset=np.array([X,Y]) def summary(self): print('The dimension fo the dataset is', np.shape(self.dataset)) def Hat(self): XtX=self.dataset.T.dot(self.dataset) temp=self.dataset.dot(np.linalg.inv(XtX)) return temp.dot(self.dataset.T) def update(self,x): self.dataset=np.vstack((self.dataset,np.array(x)))

data=Dataset([1,0,1],[2,3,1])

print(data.dataset)

print(data.summary())

print(data.Hat())

If we use return instead of print, the output None disappears.

Homer Simpson, 1007

[[1 0 1] [2 3 1]] ('The dimension fo the dataset is', (2L, 3L)) None [[ 0.5 -0.75] [-1. -0.5 ]]

8/24/2018 Classes


In [7]:

import numpy as np

class Dataset(): def __init__(self,X,Y): self.dataset=np.array([X,Y]) def summary(self): return 'The dimension fo the dataset is', np.shape(self.dataset) def Hat(self): XtX=self.dataset.T.dot(self.dataset) temp=self.dataset.dot(np.linalg.inv(XtX)) return temp.dot(self.dataset.T) def update(self,x): self.dataset=np.vstack((self.dataset,np.array(x)))

data=Dataset([1,0,1],[2,3,1])

print(data.dataset)


print(data.Hat())

In [8]:

data.update([5,1,2])

In [9]:

print(data.dataset)


print(data.Hat())

[[1 0 1] [2 3 1]] ('The dimension fo the dataset is', (2L, 3L)) [[ 0.5 -0.75] [-1. -0.5 ]]

[[1 0 1] [2 3 1] [5 1 2]] ('The dimension fo the dataset is', (3L, 3L)) [[ 1.00000000e+00 4.44089210e-16 4.44089210e-16] [ 4.44089210e-16 1.00000000e+00 9.43689571e-16] [ 0.00000000e+00 5.55111512e-17 1.00000000e+00]]

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 1/15

Modules and Packages 2

3. Pandas-fast and efficient data manipulation and analysis.

-tools for reading and writing data in different formats: CSV, Microsoft Excel, textfiles, etc.

3.1 Read CSV, Excel, text files

※ 'White' data is available in UCI data repository (https://archive.ics.uci.edu/ml/datasets/Wine).

Input variables include physicochecmical properties of white wines and the output variable is a quality score(0~10). The wines are from different cultivars.

In [1]:

import pandas as pd

data=pd.read_csv('./white.csv', header=0) # if there are no headers, type header=None # data=pd.read_csv('C://Users/user/Desktop/Dropbox/python/tutorial/white.csv') # data=pd.read_csv('./white.csv').values

In [2]:

data=pd.read_excel('./white.xlsx', sheet_name='Sheet 1') # data=pd.read_excel('./white.xlsx', sheetname=0)

In [3]:

data1=pd.read_table('./data1.txt')

In [4]:

print(type(data))

<class 'pandas.core.frame.DataFrame'>

https://archive.ics.uci.edu/ml/datasets/Wine



In [5]:

data.head()

In [6]:

data.tail()

In [7]:

data.shape

Out[5]:

fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulfur_dio

0 7.0 0.27 0.36 20.7 0.045 45.0

1 6.3 0.30 0.34 1.6 0.049 14.0

2 8.1 0.28 0.40 6.9 0.050 30.0

3 7.2 0.23 0.32 8.5 0.058 47.0

4 7.2 0.23 0.32 8.5 0.058 47.0

Out[6]:

fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulf

4893 6.2 0.21 0.29 1.6 0.039 24.0

4894 6.6 0.32 0.36 8.0 0.047 57.0

4895 6.5 0.24 0.19 1.2 0.041 30.0

4896 5.5 0.29 0.30 1.1 0.022 20.0

4897 6.0 0.21 0.38 0.8 0.020 22.0

Out[7]:

(4898, 12)



In [8]:

data.quality #data["quality"] #Indexing by string can be done only in data frames.



Out[8]:

0 6 1 6 2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 5 11 5 12 5 13 7 14 5 15 7 16 6 17 8 18 6 19 5 20 8 21 7 22 8 23 5 24 6 25 6 26 6 27 6 28 6 29 7 .. 4868 6 4869 6 4870 7 4871 6 4872 5 4873 6 4874 6 4875 6 4876 7 4877 5 4878 4 4879 6 4880 6 4881 6 4882 5 4883 6 4884 5 4885 6 4886 7 4887 7 4888 5 4889 6 4890 6 4891 6 4892 5 4893 6 4894 5 4895 6



In [9]:

data.quality.head()

In [10]:

data[["quality","fixed_acidity"]].head()

In [11]:

X=data.drop(["quality"],1) # 1 means drop the colum, 0 means drop the row

In [12]:

X.shape

※ mnist data is vailable in MNIST (http://yann.lecun.com/exdb/mnist/)

Attributes are images for handwritten digits. The digits have been size-normalized and centered in a fixed-size image.

In [13]:

import numpy as np

mnist_csv=pd.read_csv('./mnist.csv',header=0).values print(mnist_csv.shape)

4896 7 4897 6 Name: quality, dtype: int64

Out[9]:

0 6 1 6 2 6 3 6 4 6 Name: quality, dtype: int64

Out[10]:

quality fixed_acidity

0 6 7.0

1 6 6.3

2 6 8.1

3 6 7.2

4 6 7.2

Out[12]:

(4898, 11)

(42000L, 785L)

http://yann.lecun.com/exdb/mnist/



In [14]:

Y=mnist_csv[:,0] X=mnist_csv[:,1:] #X=mnist_csv.drop(['label'],1)

In [15]:

print(Y[0])

In [16]:

print(X[0])

1

[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 188 255 94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 191 250 253 93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 123 248 253 167 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80 247 253 208 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 207 253 235 77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 54 209 253 253 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 93 254 253 238 170 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 210 254 253 159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 209 253 254 240 81 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 27 253 253 254 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 206 254 254 198 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 168 253 253 196 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 203 253 248 76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 188 253 245 93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 103 253 253 191 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 240 253 195 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 220 253 253 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94 253 253 253 94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 251 253 250 131 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 214 218 95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]



In [17]:

from matplotlib import pyplot as plt

plt.imshow(np.reshape(X[10],(28,28)), cmap=plt.cm.Blues) plt.show()

3.2 Save files

Suppose we want to extract the handwritten digits corresponding to 1 and 7 only and save it in a separateCSV file.

In [18]:

mnist_1=mnist_csv[(mnist_csv[:,0]==1),:] mnist_7=mnist_csv[(mnist_csv[:,0]==7),:] mnist_1and7=np.vstack((mnist_1,mnist_7))

In [19]:

np.unique(mnist_1and7[:,0])

In [20]:

mnist_1and7_df=pd.DataFrame(mnist_1and7)

In [21]:

pd.DataFrame.to_csv(mnist_1and7_df,'C://Users/user/Desktop/mnist_1and7.csv')

Dictionaries can also be converted to Data frames (not only numpy arrays.)

Out[19]:

array([1, 7], dtype=int64)



In [22]:

A={'D':(1,0,1),'A':(2,3,1),'B':(4,3,1)} pd.DataFrame(A)

Numpy arrays can also be directly saved in .npy files, without the need to convert them to a dataFrame.

The .npy file can also be loaded using the load() function in NumPy:

In [23]:

np.save('C://Users/user/Desktop/mnist_1and7.npy',mnist_1and7)

In [24]:

data2=np.load('C://Users/user/Desktop/mnist_1and7.npy')

In [25]:

type(data2)

What about image files ?? Scipy package.

In [26]:

from scipy.misc import imread

# read a JPEG image into a numpy array img = imread('./puggle.jpg') print(img.shape) print(type(img))

⟹

Out[22]:

A B D

0 2 4 1

1 3 3 0

2 1 1 1

Out[25]:

numpy.ndarray

(1334L, 1778L, 3L) <type 'numpy.ndarray'>



In [27]:

img=np.array(img) print(img.shape) print(type(img))

In [28]:

plt.imshow(img) plt.show()

In [29]:

from skimage.transform import resize

# resize the image img_resized = resize(img, (300, 300)) print(img_resized.shape)

In [30]:

plt.imshow(img_resized) plt.show()

(1334L, 1778L, 3L) <type 'numpy.ndarray'>

(300L, 300L, 3L)



You can go here (http://www.scipy-lectures.org/advanced/image_processing/) to see more details aboutimage manipulation and processing.

4. Sklearn-provides machine learning tools such as regression, logistic regression, lda, svm.

4.1 Regression

In [31]:

from sklearn.linear_model import LinearRegression

data = data[-np.isnan(data["quality"])] x, y = data.fixed_acidity, data.quality plt.scatter(x,y,color="blue") plt.show()

Simple linear regression

In [32]:

dataM=np.matrix(data) x,y=dataM[:,0],dataM[:,11]

In [33]:

slm = LinearRegression().fit(x,y)

http://www.scipy-lectures.org/advanced/image_processing/



In [34]:

m, b = slm.coef_[0], slm.intercept_ x0, x1 = x.min(), x.max() plt.plot([x0,x1],[m*x0+b,m*x1+b],'r') plt.show()

Multiple linear regression

In [35]:

from sklearn import model_selection

X = np.array(data.drop(["quality"],1)) y = np.array(data["quality"]) X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3, random_state=100) lrm = LinearRegression() lrm.fit(X_train, y_train) forecast = lrm.predict(X_test[0:10,:]) print(forecast) accuracy = lrm.score(X_test, y_test) #R^2 print(accuracy)

In [36]:

forecast = lrm.predict(X_test) forecast=np.round(forecast) accuracy=0. for i in range(len(X_test)): accuracy+=(forecast[i]==y[i])

accuracy=accuracy/len(X_test) print(accuracy)

[ 6.13867211 4.87485387 5.80282258 6.02253147 6.30505131 6.55048122 5.98089896 6.06284824 5.56466996 5.94534116] 0.248039622626

0.37074829932



4.2 Logistic regression

In [37]:

from sklearn.linear_model import LogisticRegression

logit = LogisticRegression()

y_binary = data.quality > 5 X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y_binary, test_size=0.3, random_state=100)

logit.fit(X_train, y_train, sample_weight=None) m, b = logit.coef_, logit.intercept_ print(m);print(b)

In [38]:

forecast=logit.predict(X_test) #cut_value is set to 0.5 proba=logit.predict_proba(X_test) print(forecast[0:10]) print(proba[0:10][:,1])

In [39]:

logit.score(X_test, y_test, sample_weight=None) #In this case, score is accuracy

[[ -2.54101935e-01 -5.07112918e+00 9.77045630e-02 5.73480006e-02 -5.16717380e-01 1.26275386e-02 -3.55827563e-03 -2.75407735e+00 -5.07781468e-01 1.76446414e+00 9.43517636e-01]] [-2.75150728]

[ True False True True True True True True True True] [ 0.87042112 0.16304045 0.63426903 0.77031113 0.87390307 0.93482278 0.7470677 0.77873055 0.53333388 0.72938666]

Out[39]:

0.75510204081632648



In [40]:

logit = LogisticRegression(multi_class='multinomial', solver='newton-cg') # if we set multi_class='ovr', a binary model is fit for each label

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3, random_state=100) logit.fit(X_train, y_train, sample_weight=None) m, b = logit.coef_, logit.intercept_ print(m);print(b)

[[ 3.03822169e-01 5.20006957e-01 5.70793120e-02 2.52024817e-02 1.37418094e-01 9.92565226e-03 6.62321351e-03 3.45751835e-03 6.98984650e-02 -1.07101059e-01 -3.51365112e-01] [ 1.76233667e-01 4.42864412e+00 -8.38201087e-01 -7.22917079e-03 9.79631636e-02 -4.53363939e-02 -1.35749144e-03 2.81612870e-02 3.22029200e-01 -8.99026833e-01 -9.40667587e-01] [ -1.67885470e-01 1.96679883e+00 2.57749276e-01 3.90402440e-02 1.32059161e-01 -5.96778648e-03 2.89408890e-03 1.85104583e-02 -1.01096324e+00 -1.05479279e+00 -9.58060002e-01] [ -2.98037268e-01 -2.18790337e+00 4.41067374e-01 9.15911543e-02 5.82063242e-01 2.57283714e-03 -4.51177531e-04 3.73901190e-02 -8.03534939e-01 5.62845839e-01 -1.29928635e-01] [ -1.34709624e-01 -3.17074661e+00 -5.06020453e-01 1.16626033e-01 -9.14451306e-01 1.35920689e-02 -4.40606147e-03 -6.08232951e-02 3.76385846e-01 1.26681851e+00 4.86588744e-01] [ -1.98073955e-01 -1.50326460e+00 4.59343675e-01 1.48578203e-01 -7.39300047e-03 3.52421603e-02 -6.04836250e-03 -2.55110065e-02 5.69567218e-01 1.85448770e-01 7.56482181e-01] [ 3.18650479e-01 -5.35353221e-02 1.28981902e-01 -4.13808945e-01 -2.76593536e-02 -1.00285382e-02 2.74579052e-03 -1.18508101e-03 4.76617451e-01 4.58075625e-02 1.13695041e+00]] [ -1.86460597 8.89474224 16.42370967 8.67125805 -3.50837565 -9.45832038 -19.15840797]



In [41]:

print(logit.predict(X_test[0:10,:])) print(logit.predict_proba(X_test[0:10,:]))

In [42]:

logit.score(X_test, y_test, sample_weight=None)

4.3 LDA

In [43]:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

linearDA=LDA()

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y_binary, test_size=0.3, random_state=100)

linearDA.fit(X_train, y_train)

forecast=linearDA.predict(X_test)

proba=linearDA.predict_proba(X_test)

print(forecast[0:10]) print(proba[0:10][:,1])

[6 5 6 6 6 6 6 6 6 6] [[ 1.57883139e-03 1.20669223e-02 1.17149639e-01 5.74021502e-01 2.60200083e-01 3.49737419e-02 9.27968181e-06] [ 6.66543067e-03 2.00156054e-01 6.12938749e-01 1.61386978e-01 1.71784657e-02 1.48427391e-03 1.90049742e-04] [ 3.57704274e-03 4.55598780e-03 3.99066598e-01 4.99271740e-01 7.44847448e-02 1.90432883e-02 5.98173614e-07] [ 3.08581613e-03 3.23311320e-02 1.98428658e-01 5.55725186e-01 1.84702955e-01 2.47806539e-02 9.45598495e-04] [ 3.05674304e-03 1.11861450e-02 1.09435533e-01 5.00398227e-01 2.90048552e-01 8.04262616e-02 5.44853895e-03] [ 1.71792900e-03 2.47716940e-03 5.60518502e-02 4.80689681e-01 3.64754552e-01 9.35826687e-02 7.26149637e-04] [ 1.79913878e-03 2.05673238e-02 2.27805021e-01 5.61597694e-01 1.70178000e-01 1.80497622e-02 3.05955408e-06] [ 4.00381106e-03 6.97931564e-02 1.54430060e-01 4.63733307e-01 2.76902156e-01 2.88425399e-02 2.29496966e-03] [ 6.28710508e-03 8.34143545e-02 4.03979358e-01 4.13691486e-01 8.55659589e-02 6.32431923e-03 7.37417642e-04] [ 2.92794322e-03 3.60213543e-02 2.08301643e-01 5.25820461e-01 2.07603930e-01 1.88314473e-02 4.93220946e-04]]

Out[42]:

0.54761904761904767

[ True False True True True True True True True True] [ 0.87695609 0.14798489 0.62047705 0.78864165 0.86929275 0.94244011 0.79706656 0.81488264 0.54117851 0.79219861]



In [44]:

linearDA.score(X_test, y_test)

4.4 SVM

In [45]:

from sklearn.svm import SVC

svm_model=SVC(kernel='linear')

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y_binary, test_size=0.3, random_state=100)

svm_model.fit(X_train, y_train)

forecast=svm_model.predict(X_test) print(forecast[0:10])

In [46]:

svm_model.score(X_test,y_test)

In [47]:

#For multiclass, there is only on option: 'ovr' (default)

svm_model2=SVC(kernel='linear')

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3, random_state=100)

svm_model2.fit(X_train, y_train)

forecast=svm_model2.predict(X_test) print(forecast[0:10])

In [48]:

svm_model2.score(X_test,y_test)

Out[44]:

0.75646258503401365

[ True False True True True True True True True True]

Out[46]:

0.75986394557823134

[6 5 6 6 6 6 6 6 5 6]

Out[48]:

0.53401360544217691

read and write files - seoul national universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. ·...

Documents