read and write files - seoul national universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. ·...

60
Python for machine learning & deep learning References 1. 2017한국통계학회 여름학교, <기계학습 딥러닝 과정> 2. Cs231n, <Python/numpy tutorial>, http://cs231n.github.io/python- numpy-tutorial/

Upload: others

Post on 19-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

Python for machine learning & deep learning

References 1. 2017년 한국통계학회 여름학교, <기계학습 및 딥러닝 과정> 2. Cs231n, <Python/numpy tutorial>, http://cs231n.github.io/python-

numpy-tutorial/

Page 2: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

Install Python, Anaconda, Jupyter notebook

Data types (integer, real, list, dictionary)

Basic statements

Functions, modules & packages, classes

Read and write files

Regression, logistic regression, LDA, SVM

Page 3: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

Since 1991, created by Guido van Rossum.

Open source.

C support (easily extendable)

high-level programming language

(very powerful ideas in very few lines)

Object-oriented programming (classes)

Codes can be grouped in modules and packages.

Parallel computing

Page 4: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

Go to https://www.python.org/

Page 5: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)
Page 6: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

Data Science Platform powered by Python

Over 720 useful packages!

(numpy, scikit-learn, matplotlib, tensorflow, pytorch, …)

Page 7: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

Go to https://www.continuum.io/downloads

Page 8: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)
Page 9: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)
Page 10: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)
Page 11: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)
Page 12: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

Create a notebook file(.ipynb)

Page 13: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)
Page 14: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

Code mode

Page 15: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

Markdown mode

Page 16: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

Markdown mode

Page 17: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Datatypes

file:///C:/Users/user/Downloads/Datatypes.html 1/8

Data types1. Numeric types: integer, float, boolean2. Sequence types: string, tuple, range, list3. Set type: set4. Mapping type: dictionary

(These are all objects.)

Like Java and R, Python doesn't require to explicitly declare variable types.

Assignment is done using: (variable name) = (value).

Variable names can be composed of a~z, A~Z, 0~9 and _, but should not start with numbers!

1. Numeric types

Integer and floats

In [1]:

x = 3 print( x+1 ) print( x*2 ) print( x**2 )

In [2]:

x = x+1 print(x) x += 1 print(x) x *= 2 print(x)

4 6 9

4 5 10

Page 18: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Datatypes

file:///C:/Users/user/Downloads/Datatypes.html 2/8

In [3]:

print(2/3); print(2./3.) # ※In Python 3, division always gives a real number. print(int(2./3.)) print(3//2) print(3%2)

Booleans

In [4]:

x = (2==3) print(x)

y = (3//2!=1) print(y)

print(x and y) print(x or y) print(not x)

In [5]:

6<9, 'a'<'A', 'a'>'A', 'abcd'>'adb'

In [6]:

(1,2,3)<(2,1,3)

0 0.666666666667 0 1 1

False False False False True

Out[5]:

(True, False, True, False)

Out[6]:

True

Page 19: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Datatypes

file:///C:/Users/user/Downloads/Datatypes.html 3/8

2. Sequences: string, tuple, range, list1. indexing: s[i] selects i-th item of sequence s2. slicing: s[i:j] selects from i-th to (j-1)-th items of s3. immutable vs mutable sequences

immutable sequence: string, tuple, range / mutable sequence: list

4. operations: +, - , len(), in, not in

Strings

In [7]:

s='Deep learning'

print(s[0], s[1], s[-1])

## Note that indexing starts from 0, unlike in R!

print(s[2:5], s[:5])

In [8]:

s[5]='L' # This does not work.

In [9]:

s=s+' is great!' print(s) print(s*2)

In [10]:

len(s)

('D', 'e', 'g') ('ep ', 'Deep ')

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-8-550143e29e73> in <module>() ----> 1 s[5]='L' # This does not work. TypeError: 'str' object does not support item assignment

Deep learning is great! Deep learning is great!Deep learning is great!

Out[10]:

23

Page 20: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Datatypes

file:///C:/Users/user/Downloads/Datatypes.html 4/8

In [11]:

'Deep' in s

In [12]:

s.count('e')

In [13]:

s.split()

In [14]:

s2 = 'March %dth' % (12) print(s2)

s3= 'Today is %s %dth' % ('March',12) print(s3)

Tuples

In [15]:

#Items can be of arbitrary type t=(1,2,'abc') print(t)

#Tuples can be nested t2=(t,3,'de') print(t2)

Out[11]:

True

Out[12]:

4

Out[13]:

['Deep', 'learning', 'is', 'great!']

March 12th Today is March 12th

(1, 2, 'abc') ((1, 2, 'abc'), 3, 'de')

Page 21: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Datatypes

file:///C:/Users/user/Downloads/Datatypes.html 5/8

In [16]:

print(t[0],t[-1], t[1:2])

print(t+(3,'de')) print(t*4)

In [17]:

t[0]=5 # This does not work.

In [19]:

print(len(t)) print(2 in t)

In [20]:

x,y,z=t print(x) print(y) print(z)

Ranges

(1, 'abc', (2,)) (1, 2, 'abc', 3, 'de') (1, 2, 'abc', 1, 2, 'abc', 1, 2, 'abc', 1, 2, 'abc')

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-17-e34127f16d5a> in <module>() ----> 1 t[0]=5 # This does not work. TypeError: 'tuple' object does not support item assignment

3 True

1 2 abc

Page 22: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Datatypes

file:///C:/Users/user/Downloads/Datatypes.html 6/8

In [21]:

print(range(10))

print(range(1,10))

print(range(1,10,2))

print(range(0,-10,-2))

Lists

In [22]:

#Lists are similar to tuples, except that they are mutable ! #Also, lists cannot be used as elements of sets while tuples can.

l=[3,7,5,'a']

l[0], l[-1], l[1:2]

In [23]:

print(len(l))

print(l+[8,'b'])

#Be careful !! [1,2,3]+[1,1,1] is not [2,3,4], but [1,2,3,1,1,1]

In [24]:

'A' in l

In [25]:

l[0]=4 # This works. 'list' object supports item assignment.

Lists are objects!!

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 2, 3, 4, 5, 6, 7, 8, 9] [1, 3, 5, 7, 9] [0, -2, -4, -6, -8]

Out[22]:

(3, 'a', [7])

4 [3, 7, 5, 'a', 8, 'b']

Out[24]:

False

Page 23: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Datatypes

file:///C:/Users/user/Downloads/Datatypes.html 7/8

In [26]:

l.append(10); print(l)

del l[3]; print(l)

l.reverse(); print(l)

l2=[3,4,2,5]

l2.sort(); print(l2)

l2.pop(0); print(l2)

l2.remove(5);print(l2)

3. Sets

In [27]:

a={2,2,3,4,5,'a'} print(a)

a=[2,2,3,4,5,5] a=set(a) print(a) a.add(6) print(a) a.remove(5) print(a)

We can use subtractions for sets!

In [28]:

numbers=set(range(10)) evens={0,2,4,6,8} odds=numbers-evens print(odds)

Sets also allow union(|) and intersections(&)!

[4, 7, 5, 'a', 10] [4, 7, 5, 10] [10, 5, 7, 4] [2, 3, 4, 5] [3, 4, 5] [3, 4]

set(['a', 2, 3, 4, 5]) set([2, 3, 4, 5]) set([2, 3, 4, 5, 6]) set([2, 3, 4, 6])

set([1, 3, 9, 5, 7])

Page 24: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Datatypes

file:///C:/Users/user/Downloads/Datatypes.html 8/8

In [29]:

f={2,4,6,8} g={4,8,12} print(f|g) print(f&g)

4. Dictionaries

In [30]:

# finite set of items indexed by keys

d={'beer':14,'wine':10,'whiskey':9,'brandy':'not left'}

d['beer']

In [31]:

d['gin']=10 print(d)

In [32]:

print(d.items()) print(d.keys()) print(d.values())

In [33]:

d={'beer':14,'wine':10,'whiskey':9,'brandy':'not left'} i,k=list(d.items())[1] print(i,k) print('We have only %dL of %s left' % (k, i))

set([2, 4, 6, 8, 12]) set([8, 4])

Out[30]:

14

{'whiskey': 9, 'beer': 14, 'gin': 10, 'brandy': 'not left', 'wine': 10}

[('whiskey', 9), ('beer', 14), ('gin', 10), ('brandy', 'not left'), ('wine', 10)] ['whiskey', 'beer', 'gin', 'brandy', 'wine'] [9, 14, 10, 'not left', 10]

('beer', 14) We have only 14L of beer left

Page 25: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Basic_statements_and_functions

file:///C:/Users/user/Downloads/Basic_statements_and_functions.html 1/9

Traditional Control Flow1. Conditional execution: if, else, elif2. Repeated execution: for, while

***Python uses indents instead of parentheses to represent nested structures in the code.

1. Conditional execution (if, else, elif)

In [1]:

x=7

if x%2==0: print('x is even.')

else: print('x is not even.')

Nested structure:

In [2]:

if x%2==0: print('x is even.')

else: if x%3==0: print('x is divisible by 3.') else: print('x is not even and not divisible by 3.')

Use of elif:

In [3]:

if x%2==0: print('x is even.')

elif x%3==0: print('x is divisible by 3.')

else: print('x is not even and not divisible by 3.')

2. Repeated execution (for, while)

x is not even.

x is not even and not divisible by 3.

x is not even and not divisible by 3.

Page 26: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Basic_statements_and_functions

file:///C:/Users/user/Downloads/Basic_statements_and_functions.html 2/9

In [4]:

x=0 for i in [1,2,3]: x+=i print(x)

In [5]:

x=0 for i in range(1,10): x+=i

print(x)

In [6]:

for l in 'python is magic': print (l)

In [7]:

D={'Bob':10, 'Steven':9, 'Anna':80}

for k,v in D.items(): print('%s is %d years old'%(k,v))

We can use for and if together:

1 3 6

45

p y t h o n i s m a g i c

Bob is 10 years old Anna is 80 years old Steven is 9 years old

Page 27: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Basic_statements_and_functions

file:///C:/Users/user/Downloads/Basic_statements_and_functions.html 3/9

In [8]:

#Fibonacci Sequence

x=[] for i in range(1,10): if len(x)==0: x=x+[1,2] else: x.append(x[-1]+x[-2])

print(x)

break statement:

In [9]:

x=[] for i in range(1,100): if len(x)==0: x=x+[1,2] else: x.append(x[-1]+x[-2]) if x[-1]>1000: break

print(x)

while statement

In [10]:

x=1 while x<100: x*=2

print(x)

[1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

[1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597]

128

Page 28: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Basic_statements_and_functions

file:///C:/Users/user/Downloads/Basic_statements_and_functions.html 4/9

In [11]:

#Prime numbers

x=[2] n=2

while len(x)<100: prime=True n+=1 for i in x: if n%i==0: prime=False; break if prime: x.append(n)

print(x)

※Using for statement to construct lists, tuples, sets,dictionaries

In [12]:

nums = [0, 1, 2, 3, 4] squares = [] for x in nums: squares.append(x ** 2)

print(squares)

In [13]:

nums = [0, 1, 2, 3, 4] squares = [x ** 2 for x in nums] print(squares)

In [14]:

nums = [0, 1, 2, 3, 4] even_squares = [x ** 2 for x in nums if x % 2 == 0] print(even_squares)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541]

[0, 1, 4, 9, 16]

[0, 1, 4, 9, 16]

[0, 4, 16]

Page 29: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Basic_statements_and_functions

file:///C:/Users/user/Downloads/Basic_statements_and_functions.html 5/9

In [15]:

words='Give me a full tank of gas.'.split() short_words=[w for w in words if len(w)<3] print(short_words)

In [16]:

D={'Alice':(1,'woman'),'Bob':(1,'man'),'Anna':(2,'woman'),'Frank':(3,'man'),'Jane':(1,'woman')}

womens={k:v for k,v in D.items() if D[k][1]=='woman'} print(womens)

In [17]:

First_graders={k for k,v in D.items() if v[0]==1} print(First_graders)

In [18]:

First_graders={k for k,v in D.items() if 1 in v} print(First_graders)

FunctionsFunctions are defined using the def keyword.

Functions are also objects !

In [19]:

def sign(x): if x>0: return 'positive' elif x<0: return 'negative' else: return 'zero'

In [20]:

print(sign(1), sign(-1), sign(0))

['me', 'a', 'of']

{'Jane': (1, 'woman'), 'Alice': (1, 'woman'), 'Anna': (2, 'woman')}

set(['Jane', 'Bob', 'Alice'])

set(['Jane', 'Bob', 'Alice'])

('positive', 'negative', 'zero')

Page 30: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Basic_statements_and_functions

file:///C:/Users/user/Downloads/Basic_statements_and_functions.html 6/9

In [21]:

def comb(n,r): nom=1 for i in range(n,n-r,-1): nom*=i denom=1 for j in range(r,0,-1): denom*=j return nom/denom

In [22]:

comb(10,2)

In [23]:

comb(r=2,n=10)

About *args and *kwargs.

In [24]:

def comb(*args): nom=1 n=args[0]; r=args[1] for i in range(n,n-r,-1): nom*=i denom=1 for j in range(r,0,-1): denom*=j return nom/denom

In [25]:

comb(10,2)

Out[22]:

45

Out[23]:

45

Out[25]:

45

Page 31: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Basic_statements_and_functions

file:///C:/Users/user/Downloads/Basic_statements_and_functions.html 7/9

In [26]:

def comb(**kwargs): nom=1 n=kwargs.pop('n'); r=kwargs.pop('r') for i in range(n,n-r,-1): nom*=i denom=1 for j in range(r,0,-1): denom*=j return nom/denom

In [27]:

comb(n=10,r=2)

In [28]:

def some_func(arg, *args, **kwargs): print "First, there is one argument: ", arg print "Then, there are multiple non-keyworded arguments: ", args print "Finally, there are multiple keyworded arguments", kwargs print kwargs.pop('a')

In [29]:

some_func("fruit","apple","banana",a="orange",b="strawberry")

Lambda functions

In [30]:

g=lambda x, y:x+y; g(3,2)

In [31]:

def multiply (n): return lambda x: x*n mult2=multiply(2) mult6=multiply(6) print (mult2(10), mult6(10))

Out[27]:

45

First, there is one argument: fruit Then, there are multiple non-keyworded arguments: ('apple', 'banana') Finally, there are multiple keyworded arguments {'a': 'orange', 'b': 'strawberry'} orange

Out[30]:

5

(20, 60)

Page 32: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Basic_statements_and_functions

file:///C:/Users/user/Downloads/Basic_statements_and_functions.html 8/9

Map function

usage: map(function,seq)

In [32]:

x=[1,2,3,4,5] y=map(lambda a:a**2,x) print(list(y))

In [33]:

def comb(n,r): nom=1 for i in range(n,n-r,-1): nom*=i denom=1 for j in range(r,0,-1): denom*=j return nom/denom

In [34]:

x=[10,9,8] y=[2,2,2] z=map(comb,x,y) print(list(z))

Filter function

usage: filter(function,seq)

In [35]:

L=range(10) Even_numbers=filter(lambda x:x%2==0, L) print(Even_numbers)

Zip function

usage: zip(seq,seq,....)

In [36]:

data=zip([1,2,3],['a','b','c']) print(data)

[1, 4, 9, 16, 25]

[45, 36, 28]

[0, 2, 4, 6, 8]

[(1, 'a'), (2, 'b'), (3, 'c')]

Page 33: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Basic_statements_and_functions

file:///C:/Users/user/Downloads/Basic_statements_and_functions.html 9/9

In [37]:

data=zip([1,2,3],[10,20,30],[100,200,300]) print(data)

In [38]:

data=zip([1,2,3,4],[10,20,30]) print(data)

[(1, 10, 100), (2, 20, 200), (3, 30, 300)]

[(1, 10), (2, 20), (3, 30)]

Page 34: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages

file:///C:/Users/user/Downloads/Modules+and+Packages.html 1/9

Modules and PackagesModules are python files(.py) or C files(.c) that define variables or functions.

We can use functions in these files using import.

In [1]:

import random print (random.choice(range(10))) myL = ['a','b','c','d'] print(random.choice(myL))

In [2]:

print(random.sample([1,3,4,7,8,9,20],3))

In [3]:

print(random.uniform(-1,1))

In [4]:

print(random.gauss(0,1))

In [5]:

import random as rand

print(rand.gauss(0,1))

In [6]:

import numpy as np from random import shuffle x = np.arange(10) print (x) shuffle(x) print (x)

6 c

[8, 4, 1]

0.592482310435

-0.883743686456

-0.959601735309

[0 1 2 3 4 5 6 7 8 9] [1 6 4 8 7 9 5 2 3 0]

Page 35: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages

file:///C:/Users/user/Downloads/Modules+and+Packages.html 2/9

In [7]:

from random import shuffle as shf shf(x) print(x)

In [8]:

import math print(math.pi)

In [9]:

from math import * print(pi)

Packages are groups of modules. (folders containing modules)

Importation is done similarly. (Usually, we don't need to distinguish between packages and modules.)

There are some popular packages: Numpy, matplotlib, Pandas, sklearn

1. Numpy-provides support for large, multi-dimensional arrays and matrices, and tools for working with these arrays.

In [10]:

import numpy as np

a = [1,2,3] b = np.array([1,2,3])

print(a) print(b)

In [11]:

print([1,2,3]+[4,5,6]) print(np.array([1,2,3])+np.array([4,5,6]))

[2 5 9 0 8 3 1 4 7 6]

3.14159265359

3.14159265359

[1, 2, 3] [1 2 3]

[1, 2, 3, 4, 5, 6] [5 7 9]

Page 36: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages

file:///C:/Users/user/Downloads/Modules+and+Packages.html 3/9

In [12]:

print(type(b)) print(b.shape)

In [13]:

mat=np.array([[2,5,18,14,4], [12,15,1,2,8]]) print(mat[1,2]) print(mat[1][2]) print(mat.shape)

※ Numpy arrays are not restricted to 2-dimensional arrays. We can construct n-dimensional arrays for any n:

In [14]:

mat2=np.array([[[2,5,18,14,4], [12,15,1,2,8]],[[3,5,8,12,7], [11,1,0,20,8]]]) print(mat2[0,1,2]) print(mat2[0][1][2]) print(mat2.shape)

In [15]:

print(mat[1,2:4])

In [16]:

print(mat[:,2:4])

In [17]:

x = np.random.rand(5,5) print(x)

<type 'numpy.ndarray'> (3L,)

1 1 (2L, 5L)

1 1 (2L, 2L, 5L)

[1 2]

[[18 14] [ 1 2]]

[[ 0.33602657 0.08713678 0.3112669 0.99729453 0.20404397] [ 0.1867075 0.65289183 0.10794725 0.18783397 0.61317855] [ 0.34434668 0.84085011 0.48526731 0.12453967 0.73810552] [ 0.99193341 0.49008858 0.99953952 0.8560266 0.79620392] [ 0.43853113 0.73186459 0.59863405 0.78534825 0.39061338]]

Page 37: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages

file:///C:/Users/user/Downloads/Modules+and+Packages.html 4/9

In [18]:

x= np.random.rand(5,5,3) print(x)

In [19]:

x = np.random.randint(10,size=(2,3)) print(x) print(x.T)

In [20]:

x = np.zeros((4,4)) print(x)

[[[ 0.17519698 0.57862758 0.39816557] [ 0.5353808 0.71617415 0.6992798 ] [ 0.09908896 0.72867946 0.99160388] [ 0.87185827 0.82128772 0.15102032] [ 0.29208183 0.93137749 0.10133339]] [[ 0.97454328 0.44561974 0.23556668] [ 0.20297059 0.33170493 0.29382776] [ 0.6938783 0.72384771 0.6870527 ] [ 0.92138283 0.34110331 0.00642824] [ 0.53595384 0.71680027 0.82789677]] [[ 0.33550908 0.50187063 0.50336394] [ 0.72172305 0.01282984 0.59020119] [ 0.58070422 0.43158476 0.27933153] [ 0.74492476 0.99898476 0.4756136 ] [ 0.68603434 0.85872936 0.00503372]] [[ 0.60770832 0.54820912 0.70841008] [ 0.92211787 0.96545553 0.05764555] [ 0.39403186 0.40494627 0.1096706 ] [ 0.60754665 0.36105868 0.6888219 ] [ 0.97182466 0.26650012 0.89539913]] [[ 0.62632298 0.93303173 0.28578147] [ 0.60996479 0.29316039 0.56087895] [ 0.35029291 0.4967372 0.09356206] [ 0.03701591 0.96378074 0.55941634] [ 0.09588203 0.72256591 0.16571038]]]

[[5 1 2] [6 7 6]] [[5 6] [1 7] [2 6]]

[[ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.]]

Page 38: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages

file:///C:/Users/user/Downloads/Modules+and+Packages.html 5/9

In [21]:

x = np.ones((4,4)) print(x)

In [22]:

x = np.eye(4) print(x)

In [23]:

x = np.random.rand(5,3) print(x) print('Mean:', np.mean(x)) print('Column means: ', np.mean(x,0)) print('Row means: ', np.mean(x,1))

In [24]:

print(np.std(x)) print(np.std(x,1)) print(np.median(x)) print(np.sum(x,1))

[[ 1. 1. 1. 1.] [ 1. 1. 1. 1.] [ 1. 1. 1. 1.] [ 1. 1. 1. 1.]]

[[ 1. 0. 0. 0.] [ 0. 1. 0. 0.] [ 0. 0. 1. 0.] [ 0. 0. 0. 1.]]

[[ 0.70862641 0.49457898 0.8370175 ] [ 0.2759447 0.72783086 0.16280874] [ 0.17098052 0.62116806 0.77728114] [ 0.02145235 0.46639665 0.24476309] [ 0.32827653 0.35513869 0.81404162]] ('Mean:', 0.46708705629868918) ('Column means: ', array([ 0.3010561 , 0.53302265, 0.56718242])) ('Row means: ', array([ 0.6800743 , 0.38886143, 0.52314324, 0.24420403, 0.49915228]))

0.258313892683 [ 0.14125025 0.24409717 0.25704313 0.18164818 0.22293028] 0.466396649999 [ 2.0402229 1.16658429 1.56942972 0.73261209 1.49745684]

Page 39: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages

file:///C:/Users/user/Downloads/Modules+and+Packages.html 6/9

In [25]:

#Be carefull!!!

Mat=np.array([[1,2,3],[4,5,6]]) B=Mat[:2,:2] print(B) B[0,0]=0 print(B) print(Mat)

To avoid the above problem, use numpy.copy() function:

In [26]:

Mat=np.array([[1,2,3],[4,5,6]]) B=np.copy(Mat[:2,:2]) print(B) B[0,0]=0 print(B) print(Mat)

[[1 2] [4 5]] [[0 2] [4 5]] [[0 2 3] [4 5 6]]

[[1 2] [4 5]] [[0 2] [4 5]] [[1 2 3] [4 5 6]]

Page 40: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages

file:///C:/Users/user/Downloads/Modules+and+Packages.html 7/9

In [27]:

x = np.random.rand(4,3) print(x)

x[1,2] = -5 print(x) x[0:2,:] += 1 print(x) x[2:4,1:3] = 0.5 print(x)

x[x>0.5] = 0 print(x)

In [28]:

print(2*x+1)

In [29]:

#inner products and outer products

y = np.array([2,-1,3]) z = np.array([-1,2,2]) print(np.dot(y,z)) print(np.outer(y,z))

[[ 0.24443871 0.13558183 0.32119583] [ 0.89104852 0.00282243 0.04900889] [ 0.80992828 0.272393 0.34706002] [ 0.09410757 0.76380115 0.40320505]] [[ 2.44438709e-01 1.35581835e-01 3.21195827e-01] [ 8.91048516e-01 2.82243423e-03 -5.00000000e+00] [ 8.09928280e-01 2.72393000e-01 3.47060015e-01] [ 9.41075701e-02 7.63801149e-01 4.03205050e-01]] [[ 1.24443871 1.13558183 1.32119583] [ 1.89104852 1.00282243 -4. ] [ 0.80992828 0.272393 0.34706002] [ 0.09410757 0.76380115 0.40320505]] [[ 1.24443871 1.13558183 1.32119583] [ 1.89104852 1.00282243 -4. ] [ 0.80992828 0.5 0.5 ] [ 0.09410757 0.5 0.5 ]] [[ 0. 0. 0. ] [ 0. 0. -4. ] [ 0. 0.5 0.5 ] [ 0.09410757 0.5 0.5 ]]

[[ 1. 1. 1. ] [ 1. 1. -7. ] [ 1. 2. 2. ] [ 1.18821514 2. 2. ]]

2 [[-2 4 4] [ 1 -2 -2] [-3 6 6]]

Page 41: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages

file:///C:/Users/user/Downloads/Modules+and+Packages.html 8/9

In [30]:

#matrix multiplication

y = np.array([1,0,0]) print(x.dot(y))

y = np.array([1,0,1,0]) print(y.dot(x))

y = np.random.rand(3,2) z = x.dot(y) print (z)

In [31]:

# matrix inverse

x=np.random.rand(4,4) y=np.linalg.inv(x) print(x) print(y) print(x.dot(y))

2. Matplotlib-provides a plotting system. Matplotlib.pyplot module is the mostly used module.

[ 0. 0. 0. 0.09410757] [ 0. 0.5 0.5] [[ 0. 0. ] [-2.81630857 -1.43198197] [ 0.73427449 0.59637278] [ 0.82198746 0.60246007]]

[[ 0.78794593 0.62330993 0.04121658 0.15845135] [ 0.0643323 0.5537933 0.49065302 0.74759651] [ 0.71847476 0.64560113 0.66669248 0.85921365] [ 0.26377156 0.46578631 0.59758112 0.0417924 ]] [[ 0.17916293 -1.67106794 1.43254769 -0.23852615] [ 1.54590582 2.0290191 -2.07701599 0.54464513] [-1.26074494 -0.88293244 0.93033221 1.44738769] [-0.33313416 0.55786768 0.80472304 -1.33285939]] [[ 1.00000000e+00 -2.77555756e-17 -8.32667268e-17 0.00000000e+00] [ -5.55111512e-17 1.00000000e+00 -1.11022302e-16 0.00000000e+00] [ -5.55111512e-17 -2.77555756e-16 1.00000000e+00 -2.22044605e-16] [ -7.80625564e-17 6.24500451e-17 1.38777878e-16 1.00000000e+00]]

Page 42: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages

file:///C:/Users/user/Downloads/Modules+and+Packages.html 9/9

In [32]:

import matplotlib.pyplot as plt

x=np.arange(0,4*np.pi,0.1) y=np.sin(x) z=np.cos(x)

plt.plot(x, y) plt.plot(x, z) plt.xlabel('x') plt.ylabel('y') plt.title('Sine and Cosine functions') plt.legend(['Sine','Cosine']) plt.show()

Page 43: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Classes

file:///C:/Users/user/Downloads/Classes.html 1/3

Classes-Classes are a group of variables and functions.

-Some modules define functions inside classes.

These can be imported similarly as modules and packages, via import.

In [1]:

class MyClass: def set(self, v): self.value = v def put(self): print(self.value)

c = MyClass()

In [2]:

c.set('orange') c.put()

In [3]:

MyClass.set(c, 'orange') MyClass.put(c)

In [4]:

class Person:

def __init__(self, first, last): self.firstname = first self.lastname = last

def Name(self): return self.firstname + " " + self.lastname

x = Person("Marge", "Simpson") print(x.Name())

Classes can be inherited !!

orange

orange

Marge Simpson

Page 44: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Classes

file:///C:/Users/user/Downloads/Classes.html 2/3

In [5]:

class Employee(Person):

def __init__(self, first, last, staffnum): Person.__init__(self,first,last) self.staffnumber = staffnum

def GetEmployee(self): return self.Name() + ", " + self.staffnumber

y = Employee("Homer", "Simpson", "1007")

print(y.GetEmployee())

We can use super().init instead of Person.init.

In this case, we must remove self :

Why do we use classes?

In [6]:

import numpy as np

class Dataset(): def __init__(self,X,Y): self.dataset=np.array([X,Y]) def summary(self): print('The dimension fo the dataset is', np.shape(self.dataset)) def Hat(self): XtX=self.dataset.T.dot(self.dataset) temp=self.dataset.dot(np.linalg.inv(XtX)) return temp.dot(self.dataset.T) def update(self,x): self.dataset=np.vstack((self.dataset,np.array(x)))

data=Dataset([1,0,1],[2,3,1])

print(data.dataset)

print(data.summary())

print(data.Hat())

If we use return instead of print, the output None disappears.

Homer Simpson, 1007

[[1 0 1] [2 3 1]] ('The dimension fo the dataset is', (2L, 3L)) None [[ 0.5 -0.75] [-1. -0.5 ]]

Page 45: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Classes

file:///C:/Users/user/Downloads/Classes.html 3/3

In [7]:

import numpy as np

class Dataset(): def __init__(self,X,Y): self.dataset=np.array([X,Y]) def summary(self): return 'The dimension fo the dataset is', np.shape(self.dataset) def Hat(self): XtX=self.dataset.T.dot(self.dataset) temp=self.dataset.dot(np.linalg.inv(XtX)) return temp.dot(self.dataset.T) def update(self,x): self.dataset=np.vstack((self.dataset,np.array(x)))

data=Dataset([1,0,1],[2,3,1])

print(data.dataset)

print(data.summary())

print(data.Hat())

In [8]:

data.update([5,1,2])

In [9]:

print(data.dataset)

print(data.summary())

print(data.Hat())

[[1 0 1] [2 3 1]] ('The dimension fo the dataset is', (2L, 3L)) [[ 0.5 -0.75] [-1. -0.5 ]]

[[1 0 1] [2 3 1] [5 1 2]] ('The dimension fo the dataset is', (3L, 3L)) [[ 1.00000000e+00 4.44089210e-16 4.44089210e-16] [ 4.44089210e-16 1.00000000e+00 9.43689571e-16] [ 0.00000000e+00 5.55111512e-17 1.00000000e+00]]

Page 46: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 1/15

Modules and Packages 2

3. Pandas-fast and efficient data manipulation and analysis.

-tools for reading and writing data in different formats: CSV, Microsoft Excel, textfiles, etc.

3.1 Read CSV, Excel, text files

※ 'White' data is available in UCI data repository (https://archive.ics.uci.edu/ml/datasets/Wine).

Input variables include physicochecmical properties of white wines and the output variable is a quality score(0~10). The wines are from different cultivars.

In [1]:

import pandas as pd

data=pd.read_csv('./white.csv', header=0) # if there are no headers, type header=None # data=pd.read_csv('C://Users/user/Desktop/Dropbox/python/tutorial/white.csv') # data=pd.read_csv('./white.csv').values

In [2]:

data=pd.read_excel('./white.xlsx', sheet_name='Sheet 1') # data=pd.read_excel('./white.xlsx', sheetname=0)

In [3]:

data1=pd.read_table('./data1.txt')

In [4]:

print(type(data))

<class 'pandas.core.frame.DataFrame'>

Page 47: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 2/15

In [5]:

data.head()

In [6]:

data.tail()

In [7]:

data.shape

Out[5]:

fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulfur_dio

0 7.0 0.27 0.36 20.7 0.045 45.0

1 6.3 0.30 0.34 1.6 0.049 14.0

2 8.1 0.28 0.40 6.9 0.050 30.0

3 7.2 0.23 0.32 8.5 0.058 47.0

4 7.2 0.23 0.32 8.5 0.058 47.0

Out[6]:

fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulf

4893 6.2 0.21 0.29 1.6 0.039 24.0

4894 6.6 0.32 0.36 8.0 0.047 57.0

4895 6.5 0.24 0.19 1.2 0.041 30.0

4896 5.5 0.29 0.30 1.1 0.022 20.0

4897 6.0 0.21 0.38 0.8 0.020 22.0

Out[7]:

(4898, 12)

Page 48: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 3/15

In [8]:

data.quality #data["quality"] #Indexing by string can be done only in data frames.

Page 49: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 4/15

Out[8]:

0 6 1 6 2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 5 11 5 12 5 13 7 14 5 15 7 16 6 17 8 18 6 19 5 20 8 21 7 22 8 23 5 24 6 25 6 26 6 27 6 28 6 29 7 .. 4868 6 4869 6 4870 7 4871 6 4872 5 4873 6 4874 6 4875 6 4876 7 4877 5 4878 4 4879 6 4880 6 4881 6 4882 5 4883 6 4884 5 4885 6 4886 7 4887 7 4888 5 4889 6 4890 6 4891 6 4892 5 4893 6 4894 5 4895 6

Page 50: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 5/15

In [9]:

data.quality.head()

In [10]:

data[["quality","fixed_acidity"]].head()

In [11]:

X=data.drop(["quality"],1) # 1 means drop the colum, 0 means drop the row

In [12]:

X.shape

※ mnist data is vailable in MNIST (http://yann.lecun.com/exdb/mnist/)

Attributes are images for handwritten digits. The digits have been size-normalized and centered in a fixed-size image.

In [13]:

import numpy as np

mnist_csv=pd.read_csv('./mnist.csv',header=0).values print(mnist_csv.shape)

4896 7 4897 6 Name: quality, dtype: int64

Out[9]:

0 6 1 6 2 6 3 6 4 6 Name: quality, dtype: int64

Out[10]:

quality fixed_acidity

0 6 7.0

1 6 6.3

2 6 8.1

3 6 7.2

4 6 7.2

Out[12]:

(4898, 11)

(42000L, 785L)

Page 51: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 6/15

In [14]:

Y=mnist_csv[:,0] X=mnist_csv[:,1:] #X=mnist_csv.drop(['label'],1)

In [15]:

print(Y[0])

In [16]:

print(X[0])

1

[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 188 255 94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 191 250 253 93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 123 248 253 167 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80 247 253 208 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 207 253 235 77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 54 209 253 253 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 93 254 253 238 170 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 210 254 253 159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 209 253 254 240 81 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 27 253 253 254 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 206 254 254 198 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 168 253 253 196 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 203 253 248 76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 188 253 245 93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 103 253 253 191 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 240 253 195 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 220 253 253 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94 253 253 253 94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 251 253 250 131 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 214 218 95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

Page 52: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 7/15

In [17]:

from matplotlib import pyplot as plt

plt.imshow(np.reshape(X[10],(28,28)), cmap=plt.cm.Blues) plt.show()

3.2 Save files

Suppose we want to extract the handwritten digits corresponding to 1 and 7 only and save it in a separateCSV file.

In [18]:

mnist_1=mnist_csv[(mnist_csv[:,0]==1),:] mnist_7=mnist_csv[(mnist_csv[:,0]==7),:] mnist_1and7=np.vstack((mnist_1,mnist_7))

In [19]:

np.unique(mnist_1and7[:,0])

In [20]:

mnist_1and7_df=pd.DataFrame(mnist_1and7)

In [21]:

pd.DataFrame.to_csv(mnist_1and7_df,'C://Users/user/Desktop/mnist_1and7.csv')

Dictionaries can also be converted to Data frames (not only numpy arrays.)

Out[19]:

array([1, 7], dtype=int64)

Page 53: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 8/15

In [22]:

A={'D':(1,0,1),'A':(2,3,1),'B':(4,3,1)} pd.DataFrame(A)

Numpy arrays can also be directly saved in .npy files, without the need to convert them to a dataFrame.

The .npy file can also be loaded using the load() function in NumPy:

In [23]:

np.save('C://Users/user/Desktop/mnist_1and7.npy',mnist_1and7)

In [24]:

data2=np.load('C://Users/user/Desktop/mnist_1and7.npy')

In [25]:

type(data2)

What about image files ?? Scipy package.

In [26]:

from scipy.misc import imread

# read a JPEG image into a numpy array img = imread('./puggle.jpg') print(img.shape) print(type(img))

Out[22]:

A B D

0 2 4 1

1 3 3 0

2 1 1 1

Out[25]:

numpy.ndarray

(1334L, 1778L, 3L) <type 'numpy.ndarray'>

Page 54: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 9/15

In [27]:

img=np.array(img) print(img.shape) print(type(img))

In [28]:

plt.imshow(img) plt.show()

In [29]:

from skimage.transform import resize

# resize the image img_resized = resize(img, (300, 300)) print(img_resized.shape)

In [30]:

plt.imshow(img_resized) plt.show()

(1334L, 1778L, 3L) <type 'numpy.ndarray'>

(300L, 300L, 3L)

Page 55: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 10/15

You can go here (http://www.scipy-lectures.org/advanced/image_processing/) to see more details aboutimage manipulation and processing.

4. Sklearn-provides machine learning tools such as regression, logistic regression, lda, svm.

4.1 Regression

In [31]:

from sklearn.linear_model import LinearRegression

data = data[-np.isnan(data["quality"])] x, y = data.fixed_acidity, data.quality plt.scatter(x,y,color="blue") plt.show()

Simple linear regression

In [32]:

dataM=np.matrix(data) x,y=dataM[:,0],dataM[:,11]

In [33]:

slm = LinearRegression().fit(x,y)

Page 56: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 11/15

In [34]:

m, b = slm.coef_[0], slm.intercept_ x0, x1 = x.min(), x.max() plt.plot([x0,x1],[m*x0+b,m*x1+b],'r') plt.show()

Multiple linear regression

In [35]:

from sklearn import model_selection

X = np.array(data.drop(["quality"],1)) y = np.array(data["quality"]) X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3, random_state=100) lrm = LinearRegression() lrm.fit(X_train, y_train) forecast = lrm.predict(X_test[0:10,:]) print(forecast) accuracy = lrm.score(X_test, y_test) #R^2 print(accuracy)

In [36]:

forecast = lrm.predict(X_test) forecast=np.round(forecast) accuracy=0. for i in range(len(X_test)): accuracy+=(forecast[i]==y[i])

accuracy=accuracy/len(X_test) print(accuracy)

[ 6.13867211 4.87485387 5.80282258 6.02253147 6.30505131 6.55048122 5.98089896 6.06284824 5.56466996 5.94534116] 0.248039622626

0.37074829932

Page 57: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 12/15

4.2 Logistic regression

In [37]:

from sklearn.linear_model import LogisticRegression

logit = LogisticRegression()

y_binary = data.quality > 5 X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y_binary, test_size=0.3, random_state=100)

logit.fit(X_train, y_train, sample_weight=None) m, b = logit.coef_, logit.intercept_ print(m);print(b)

In [38]:

forecast=logit.predict(X_test) #cut_value is set to 0.5 proba=logit.predict_proba(X_test) print(forecast[0:10]) print(proba[0:10][:,1])

In [39]:

logit.score(X_test, y_test, sample_weight=None) #In this case, score is accuracy

[[ -2.54101935e-01 -5.07112918e+00 9.77045630e-02 5.73480006e-02 -5.16717380e-01 1.26275386e-02 -3.55827563e-03 -2.75407735e+00 -5.07781468e-01 1.76446414e+00 9.43517636e-01]] [-2.75150728]

[ True False True True True True True True True True] [ 0.87042112 0.16304045 0.63426903 0.77031113 0.87390307 0.93482278 0.7470677 0.77873055 0.53333388 0.72938666]

Out[39]:

0.75510204081632648

Page 58: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 13/15

In [40]:

logit = LogisticRegression(multi_class='multinomial', solver='newton-cg') # if we set multi_class='ovr', a binary model is fit for each label

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3, random_state=100) logit.fit(X_train, y_train, sample_weight=None) m, b = logit.coef_, logit.intercept_ print(m);print(b)

[[ 3.03822169e-01 5.20006957e-01 5.70793120e-02 2.52024817e-02 1.37418094e-01 9.92565226e-03 6.62321351e-03 3.45751835e-03 6.98984650e-02 -1.07101059e-01 -3.51365112e-01] [ 1.76233667e-01 4.42864412e+00 -8.38201087e-01 -7.22917079e-03 9.79631636e-02 -4.53363939e-02 -1.35749144e-03 2.81612870e-02 3.22029200e-01 -8.99026833e-01 -9.40667587e-01] [ -1.67885470e-01 1.96679883e+00 2.57749276e-01 3.90402440e-02 1.32059161e-01 -5.96778648e-03 2.89408890e-03 1.85104583e-02 -1.01096324e+00 -1.05479279e+00 -9.58060002e-01] [ -2.98037268e-01 -2.18790337e+00 4.41067374e-01 9.15911543e-02 5.82063242e-01 2.57283714e-03 -4.51177531e-04 3.73901190e-02 -8.03534939e-01 5.62845839e-01 -1.29928635e-01] [ -1.34709624e-01 -3.17074661e+00 -5.06020453e-01 1.16626033e-01 -9.14451306e-01 1.35920689e-02 -4.40606147e-03 -6.08232951e-02 3.76385846e-01 1.26681851e+00 4.86588744e-01] [ -1.98073955e-01 -1.50326460e+00 4.59343675e-01 1.48578203e-01 -7.39300047e-03 3.52421603e-02 -6.04836250e-03 -2.55110065e-02 5.69567218e-01 1.85448770e-01 7.56482181e-01] [ 3.18650479e-01 -5.35353221e-02 1.28981902e-01 -4.13808945e-01 -2.76593536e-02 -1.00285382e-02 2.74579052e-03 -1.18508101e-03 4.76617451e-01 4.58075625e-02 1.13695041e+00]] [ -1.86460597 8.89474224 16.42370967 8.67125805 -3.50837565 -9.45832038 -19.15840797]

Page 59: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 14/15

In [41]:

print(logit.predict(X_test[0:10,:])) print(logit.predict_proba(X_test[0:10,:]))

In [42]:

logit.score(X_test, y_test, sample_weight=None)

4.3 LDA

In [43]:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

linearDA=LDA()

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y_binary, test_size=0.3, random_state=100)

linearDA.fit(X_train, y_train)

forecast=linearDA.predict(X_test)

proba=linearDA.predict_proba(X_test)

print(forecast[0:10]) print(proba[0:10][:,1])

[6 5 6 6 6 6 6 6 6 6] [[ 1.57883139e-03 1.20669223e-02 1.17149639e-01 5.74021502e-01 2.60200083e-01 3.49737419e-02 9.27968181e-06] [ 6.66543067e-03 2.00156054e-01 6.12938749e-01 1.61386978e-01 1.71784657e-02 1.48427391e-03 1.90049742e-04] [ 3.57704274e-03 4.55598780e-03 3.99066598e-01 4.99271740e-01 7.44847448e-02 1.90432883e-02 5.98173614e-07] [ 3.08581613e-03 3.23311320e-02 1.98428658e-01 5.55725186e-01 1.84702955e-01 2.47806539e-02 9.45598495e-04] [ 3.05674304e-03 1.11861450e-02 1.09435533e-01 5.00398227e-01 2.90048552e-01 8.04262616e-02 5.44853895e-03] [ 1.71792900e-03 2.47716940e-03 5.60518502e-02 4.80689681e-01 3.64754552e-01 9.35826687e-02 7.26149637e-04] [ 1.79913878e-03 2.05673238e-02 2.27805021e-01 5.61597694e-01 1.70178000e-01 1.80497622e-02 3.05955408e-06] [ 4.00381106e-03 6.97931564e-02 1.54430060e-01 4.63733307e-01 2.76902156e-01 2.88425399e-02 2.29496966e-03] [ 6.28710508e-03 8.34143545e-02 4.03979358e-01 4.13691486e-01 8.55659589e-02 6.32431923e-03 7.37417642e-04] [ 2.92794322e-03 3.60213543e-02 2.08301643e-01 5.25820461e-01 2.07603930e-01 1.88314473e-02 4.93220946e-04]]

Out[42]:

0.54761904761904767

[ True False True True True True True True True True] [ 0.87695609 0.14798489 0.62047705 0.78864165 0.86929275 0.94244011 0.79706656 0.81488264 0.54117851 0.79219861]

Page 60: Read and write files - Seoul National Universitystat.snu.ac.kr/mcp/python.pdf · 2018. 8. 27. · Install Python, Anaconda, Jupyter notebook Data types (integer, real, list, dictionary)

8/24/2018 Modules and Packages 2

file:///C:/Users/user/Downloads/Modules+and+Packages+2.html 15/15

In [44]:

linearDA.score(X_test, y_test)

4.4 SVM

In [45]:

from sklearn.svm import SVC

svm_model=SVC(kernel='linear')

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y_binary, test_size=0.3, random_state=100)

svm_model.fit(X_train, y_train)

forecast=svm_model.predict(X_test) print(forecast[0:10])

In [46]:

svm_model.score(X_test,y_test)

In [47]:

#For multiclass, there is only on option: 'ovr' (default)

svm_model2=SVC(kernel='linear')

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.3, random_state=100)

svm_model2.fit(X_train, y_train)

forecast=svm_model2.predict(X_test) print(forecast[0:10])

In [48]:

svm_model2.score(X_test,y_test)

Out[44]:

0.75646258503401365

[ True False True True True True True True True True]

Out[46]:

0.75986394557823134

[6 5 6 6 6 6 6 6 5 6]

Out[48]:

0.53401360544217691