1 lecture 3 data representation introduction to information technology dr. ken tsang 曾镜涛...

48
1 Lecture 3 Data Representation Introduction to Information Technology Dr. Ken Tsang 曾曾曾 Email: [email protected] http://www.uic.edu.hk/~kentsang/IT/IT3.htm Room E408 R9

Upload: sherman-watson

Post on 27-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

1

Lecture 3 Data Representation

Introduction to Information Technology

Dr. Ken Tsang 曾镜涛Email: [email protected]://www.uic.edu.hk/~kentsang/IT/IT3.htmRoom E408 R9

2

Outline Distinguish between analogue and digital

information Explain data compression and compression ratios Examine the binary formats for negative values Describe the characteristics of the ASCII and

Unicode character sets Explain the nature of sound and its representation Explain how RGB values define a colour Look at representing Audio Information Look at representing Images & Graphics Look at representing Video Information

3

Data Representation Data comes in many forms

Numbers: 235, 11.01, -24, … Text: “hello, world!” “你好!” Audio: .mp3 Images and graphics: .bmp, gif, JPEG Video: .avi

All of the data is stored in computers as binary digits Data must be represented in a way that

Captures the essence of the information And in a form that is convenient for computer

processing

4

Data Compression Data compression

Reduction in the amount of space needed to store a piece of data

Compression ratio The size of the compressed data divided by the

size of the original data Data compression techniques can be

lossless, which means the data can be retrieved without any loss of the original information,

lossy, which means some information may be lost in the process of compaction

5

WinRAR

Currently the best archiver WinRAR Tutorial http://users.pandora.be/soulmaniacs/

winrar.html

6

Data about the world around us Is the physical world around us smooth

and continuous? In the microscopic level, materials are

all make up of molecules and atoms, energies are all in units of quanta.

A smooth and continuous physical world is an illusion due to our limited senses.

7

Analogue Data: an example Analogue: something that is analogous

or similar to something else (Webster) Analogue Data: The use of continuously

changing quantities to represent data. A mercury thermometer is an analogue

device. The mercury rises and falls in a continuous flow in the tube in direct proportion to the temperature.

The mathematical idealization of this smooth change as a continuous function leads to “Analogue Data”, an infinite amount of data

8

From Analogue to Digital data Data can be represented in one of two

ways: analogue or digital:Analogue data: A continuous representation (using mathematical function or smooth curve) , analogous to the actual information it represents

Digital data: A discrete representation, breaking the information up into separate elements (data)

9

Digital data in computer Computer components are discrete in

nature Computer memory and other

hardware (e.g. cpu) have only finite room to store and manipulate data

The goal is to represent enough of the world to satisfy our computational needs and our senses of sight and sound

10

Digitized Information Computers, cannot deal with analogue

information So we digitize information by

breaking it into pieces and representing those pieces separately

Why do we use binary? Modern computers are designed to use

and manage binary values because the devices that store and manage the data are far less expensive and far more reliable if they only have to represent one of two possible values

11

Electronic Signals An analogue signal continually fluctuates

in voltage up and down A digital signal has only a high or low

state, corresponding to the two binary digits

All electronic signals (both analogue and digital) degrade as they move down a line

The voltage of the signal fluctuates due to environmental effects

12

Analogue and Digital Information Periodically, a digital signal is reclocked

to regain its original shape

An analogue and a digital signal

Degradation of analogue and digital signals

13

Binary Representation One bit can be either 0 or 1 Therefore, one bit can represent only

two things To represent more than two things, we

need multiple bits Two bits can represent four things

because there are four combinations of 0 and 1 that can be made from two bits: 00, 01, 10, 11

14

Binary Representation

Represents2 numbers

Represents2 numbers

44

1616

3232

88

15

Binary Representation In general, n bits can represent 2n things

because there are 2n combinations of 0 and 1 that can be made from n bits

Note that every time we increase the number of bits by 1, we double the number of things we can represent

Questions: How many bits are needed to represent 128 things? How many bits are needed to represent 67 things?

16

Representing Negative Values You have used the signed-magnitude

representation of numbers before The sign represents the ordering/direction The digits represent the magnitude of the

number

17

Representing Negative Values Problems with the sign-magnitude

representation There are two representations of zero

(plus zero and minus zero, +0 and -0) which can cause unnecessary complexity

Problem to represent the negative sign If we allow only a fixed number of values

(stored in n-bits), we can represent numbers as just integer values, where half of them represent negative numbers

18

Representing Negative Values For example, if the maximum number of

decimal digits we can represent is two, we can let 1 through 49 be the positive numbers 1 through 49 and let 50 through 99 represent the negative numbers -50 through -1

This representation of negative numbers is called the ten’s complement

19

Advantages of Using 10’s Complement To perform addition within this scheme,

you just add the numbers together and discard any carry

20

Advantages of Using 10’s Complement A-B=A+(-B). We can subtract one

number from another by adding the negative of the second to the first

Addition and subtraction become the same

21

2’s Complement3 bits:

000 0 001

+1010

+2 011

+3100 - 4 101 -

3110 -

2 111 -

1

8 bits:

22

Overflow Overflow occurs when the value that we

compute cannot fit into the number of bits we have allocated for the result

For example, if each value is stored using eight bits, adding 127 to 3 causes overflow

Overflow is a classic example of the type of problems we encounter by mapping an infinite world onto a finite machine

23

Overflow

1111111+ 0000011 10000010

127 127 + 3+ 3

24

Representing Text A text document can be decomposed into

chapters, paragraphs, sentences, words, and ultimately individual characters

To represent a text document in digital form, we simply need to be able to represent every character that may appear In English, “a, b, …, z, A, B,…Z”

The general approach for representing characters is to list them all and assign each a binary string ‘a’ (01100001)2 (97)10 61h

25

Character Set A character set is a list of

characters and the codes used to represent them

By agreeing to use a particular character set, computer manufacturers have made the processing of text data easier

ASCII, Unicode, etc.

26

ASCII ASCII stands for American Standard

Code for Information Interchange The ASCII character set originally

used seven bits to represent each character, allowing for 128 unique characters

Later ASCII evolved so that all eight bits were used which allows for 256 characters

27

ASCII

28

ASCII Note that the first 32 characters

in the ASCII character chart do not have a simple character representation that you could print to the screen (unprintable)

29

Unicode Character Set Extended version of the ASCII character

set is not enough for international use The Unicode character set uses 16 bits per

character Therefore, the Unicode character set can

represent 216, or over 65 thousand, characters Unicode was designed to be a superset of

ASCII The first 256 characters in the Unicode

character set correspond exactly to the extended ASCII character set

30

Unicode

31

Representing Audio Information We perceive sound when a series of air

compressions vibrate a membrane in our ear, which sends signals to our brain

A stereo sends an electrical signal to a speaker to produce sound

This signal is an analogue representation of the sound wave

The voltage in the signal varies in direct proportion to the sound wave

32

To digitize the signal we periodically measure the voltage of the signal and record the appropriate numeric value A process called sampling

In general, a sampling rate of around 40,000 times per second is enough to create a reasonable sound reproduction

Representing Audio Information

33

Representing Audio Information

34

• A compact disk (CD) stores audio information digitally • On the surface of the CD are microscopic pits that represent Binary digits•A low intensity laser is pointed as the disc•The laser light reflects strongly if the surface is smooth and reflects poorly if the surface is pitted

Representing Audio Information

35

Audio Formats WAV, AU, AIFF, VQF, and MP3

MP3 is dominant MP3 is short for MPEG (Moving Picture Experts

Group) audio layer 3 file MP3 employs both lossy and lossless compression First it analyzes the frequency spread and

compares it to mathematical models of human psychoacoustics (the study of the interrelation between the ear and the brain), then it discards information that can’t be heard by humans

Then the bit stream is compressed to achieve additional compression

Representing Audio Information

36

Representing Colour Colour is our perception of the

various frequencies of light that reach the retinas of our eyes

Our retinas have three types of colour photoreceptor cone cells that respond to different sets of frequencies

These photoreceptor categories correspond to the colours of red, green, and blue

37

Representing Colour Color is often expressed in a computer

as an RGB (red-green-blue) value, which is actually three numbers that indicate the relative contribution of each of these three primary colours

For example, an RGB value of (255, 255, 0) maximizes the contribution of red and green, and minimizes the contribution of blue, which results in a bright yellow

38

Three Dimension Colour Space

(0,0,0)

(1,1,1)

39

Representing Images and Graphics The amount of data that is used to

represent a colour is called the colour depth

HiColour is a term that indicates a 16-bit color depth Five bits are used for each number in an RGB

value and the extra bit is sometimes used to represent transparency

TrueColour indicates a 24-bit colour depth Each number in an RGB value gets eight bits

40

Indexed Color• A particular application such as a browser

may support only a certain number of

specific colors, creating a palette from

which to choose.

• For example:

41

Digitized Images and Graphics Digitizing a picture is the act of representing

it as a collection of individual dots called pixels

The number of pixels used to represent a picture is called the resolution

Storage of image information on a pixel-by-pixel basis is called a raster-graphics format

Several popular raster file formats including bitmap (BMP), GIF, and JPEG

43

Digitized Images and Graphics

High Resolution

44

Digitized Images and Graphics

Low Resolution

45

Representing Video Video codec (COmpressor/DECompressor)

refers to the methods used to shrink the size of a movie to allow it to be played on a computer or over a network

Almost all video codecs use lossy compression to minimize the huge amounts of data associated with video

The goal is not to lose information that affects the viewer's senses

46

Video Players QuickTime Player (Apple) Real Player VLC media player Microsoft Media Player

47

Summary Distinguished between analogue and digital

information Explained data compression and compression ratios Examined the binary formats for negative values Described the characteristics of the ASCII and

Unicode character sets Explained the nature of sound and its representation Explained how RGB values define a colour Looked at representing Audio Information Looked at representing Images & Graphics Looked at representing Video Information