cache技术讨论_20130304

7/29/2019 CACHE _20130304

1/17

CACHE

sina@

7/29/2019 CACHE _20130304

2/17

2

CACHE

"Caching is a temp location where I store data in (data that I

need it frequently) as the original data is expensive to be

fetched, so I can retrieve it faster.

INTRO

7/29/2019 CACHE _20130304

3/17

3

CACHE

1. The user will get upset and complain and even wont use this

application again

2. The storage place will pack up its bags and leave your application ,

and that made a big problems(no place to store data)

Why do we need cache ?

7/29/2019 CACHE _20130304

4/17

4

CACHE

What is cache ?

7/29/2019 CACHE _20130304

5/17

5

CACHE

1. When the client invokes a request (lets say he want to view product

information) and our application gets the request it will need to access the

product data in our storage (database), it first checks the cache.

2. If an entry can be found with a tag matching that of the desired data (say

product Id), the entry is used instead. This is known as a cache hit (cache hit

is the primary measurement for the caching effectiveness we will discuss that

later on).

3. And the percentage of accesses that result in cache hits is known as the hit

rate orhit ratio of the cache.

Cache Hit

7/29/2019 CACHE _20130304

6/17

6

CACHE

Cache Miss

On the contrary when the tag isnt found in the cache (no match were found) thisis known as cache miss , a hit to the back storage is made and the data is

fetched back and it is placed in the cache so in future hits it will be found and will

make a cache hit.

If we encountered a cache miss there can be either a scenarios from two

scenarios:

1.There is free space in the cache (the cache didnt reach its limit and there is

free space) so in this case the object that cause the cache miss will be retrieved

from our storage and get inserted in to the cache.

2.There is no free space in the cache (cache reached its capacity) so the objectthat cause cache miss will be fetched from the storage and then we will have to

decide which object in the cache we need to move in order to place our newly

created object (the one we just retrieved) this is done by replacement policy

(caching algorithms) that decide which entry will be remove to make more room

which will be discussed below.

7/29/2019 CACHE _20130304

7/17

7

CACHE

When a cache miss occurs, data will be fetch it from the back storage, load it

and place it in the cache but how much space the data we just fetched takes in

the cache memory? This is known as Storage cost

Storage Cost

7/29/2019 CACHE _20130304

8/17

8

CACHE

And when we need to load the data we need to know how much does it take to

load the data. This is known as Retrieval cost

Retrieval Cost

7/29/2019 CACHE _20130304

9/17

9

CACHE

When cache miss happens, the cache ejects some other entry in order to make

room for the previously uncached data (in case we dont have enough room).

The heuristic used to select the entry to eject is known as the replacement

policy.

Replacement Policy

7/29/2019 CACHE _20130304

10/17

10

Caching Algorithms

Least Frequently Used (LFU):

I am Least Frequently used; I count how often an entry is needed by

incrementing a counter associated with each entry.

I remove the entry with least frequently used counter first am not that fast and Iam not that good in adaptive actions (which means that it keeps the entries

which is really needed and discard the ones that arent needed for the longest

period based on the access pattern or in other words the request pattern)

LFU

7/29/2019 CACHE _20130304

11/17

11

Caching Algorithms

Least Recently Used (LRU):

I am Least Recently Used cache algorithm; I remove the least recently used

items first. The one that wasnt used for a longest time.

I require keeping track of what was used when, which is expensive if one wants

to make sure that I always discards the least recently used item.

Web browsers use me for caching. New items are placed into the top of the

cache. When the cache exceeds its size limit, I will discard items from the

bottom. The trick is that whenever an item is accessed, I place at the top.

So items which are frequently accessed tend to stay in the cache. There are twoways to implement me either an array or a linked list (which will have the least

recently used entry at the back and the recently used at the front).

I am fast and I am adaptive in other words I can adopt to data access pattern, I

have a large family which completes me and they are even better than me (I do

feel jealous some times but it is ok) some of my family member are (LRU2 and2Q) (they were implemented in order to improve LRU caching

LRU

7/29/2019 CACHE _20130304

12/17

12

Caching Algorithms

Least Recently Used 2(LRU2):

I am Least recently used 2, some people calls me least recently used twice

which I like it more, I add entries to the cache the second time they are

accessed (it requires two times in order to place an entry in the cache); when

the cache is full, I remove the entry that has a second most recent access.

Because of the need to track the two most recent accesses, access overhead

increases with cache size, If I am applied to a big cache size, that would be a

problem, which can be a disadvantage. In addition, I have to keep track of some

items not yet in the cache (they arent requested two times yet).I am better that

LRU and I am also adoptive to access patterns.

I am Two Queues; I add entries to an LRU cache as they are accessed. If an

entry is accessed again, I move them to second, larger, LRU cache.

LRU2

7/29/2019 CACHE _20130304

13/17

13

Caching Algorithms

Adaptive Replacement Cache (ARC):

I am Adaptive Replacement Cache; some people say that I balance between

LRU and LFU, to improve combined result, well thats not 100% true actually I

am made from 2 LRU lists, One list, say L1, contains entries that have been

seen only once recently, while the other list, say L2, contains entries that have

been seen at least twice recently.

The items that have been seen twice within a short time have a low inter-arrival

rate, and, hence, are thought of as high-frequency. Hence, we think of L1 as

capturing recency while L2 as capturing frequency so most of people think I

am a balance between LRU and LFU but that is ok I am not angry form that.

I am considered one of the best performance replacement algorithms, Self

tuning algorithm and low overhead replacement cache I also keep history of

entries equal to the size of the cache location; this is to remember the entries

that were removed and it allows me to see if a removed entry should have

stayed and we should have chosen another one to remove.(I really have bad

memory)And yes I am fast and adaptive.

ARC

7/29/2019 CACHE _20130304

14/17

14

Caching Algorithms

Most Recently Used (MRU):

I am most recently used, in contrast to LRU; I remove the most recently used

items first. You will ask me why for sure, well let me tell you something when

access is unpredictable, and determining the least most recently used entry in

the cache system is a high time complexity operation, I am the best choice thats

why.

I am so common in the database memory caches, whenever a cached record is

used; I replace it to the top of stack. And when there is no room the entry on the

top of the stack, guess what? I will replace the top most entry with the new entry.

MRU

7/29/2019 CACHE _20130304

15/17

15

Caching Algorithms

First in First out (FIFO):

I am first in first out; I am a low-overhead algorithm I require little effort for

managing the cache entries. The idea is that I keep track of all the cache entries

in a queue, with the most recent entry at the back, and the earliest entry in the

front. When there e is no place and an entry needs to be replaced, I will remove

the entry at the front of the queue (the oldest entry) and replaced with thecurrent fetched entry. I am fast but I am not adaptive.

Hello I am Second Change I am a modified form of the FIFO replacement

algorithm, known as the Second chance replacement algorithm, I am better than

FIFO at little cost for the improvement.

I am Clock and I am a more efficient version of FIFO than Second chance

because I dont push the cached entries to the back of the list like Second

change do, but I perform the same general function as Second-Chance.

FIFO

7/29/2019 CACHE _20130304

16/17

16

Caching Algorithms

Distributed caching:

1.Caching Data can be stored in separate memory area from the caching

directory itself (who handle the caching entries and so on) can be across

network or disk for example.

2.Distrusting the cache allows increase in the cache size.

3.In this case the retrieval cost will increase also due to network request time.

4.This will also lead to hit ratio increase due to the large size of the cache

FIFO

7/29/2019 CACHE _20130304

17/17

Thank You

SINA@Make Presentation much more fun

cache技术讨论_20130304

Documents