1 lecture 3 data representation introduction to information technology dr. ken tsang 曾镜涛...
TRANSCRIPT
1
Lecture 3 Data Representation
Introduction to Information Technology
Dr. Ken Tsang 曾镜涛Email: [email protected]://www.uic.edu.hk/~kentsang/IT/IT3.htmRoom E408 R9
2
Outline Distinguish between analogue and digital
information Explain data compression and compression ratios Examine the binary formats for negative values Describe the characteristics of the ASCII and
Unicode character sets Explain the nature of sound and its representation Explain how RGB values define a colour Look at representing Audio Information Look at representing Images & Graphics Look at representing Video Information
3
Data Representation Data comes in many forms
Numbers: 235, 11.01, -24, … Text: “hello, world!” “你好!” Audio: .mp3 Images and graphics: .bmp, gif, JPEG Video: .avi
All of the data is stored in computers as binary digits Data must be represented in a way that
Captures the essence of the information And in a form that is convenient for computer
processing
4
Data Compression Data compression
Reduction in the amount of space needed to store a piece of data
Compression ratio The size of the compressed data divided by the
size of the original data Data compression techniques can be
lossless, which means the data can be retrieved without any loss of the original information,
lossy, which means some information may be lost in the process of compaction
5
WinRAR
Currently the best archiver WinRAR Tutorial http://users.pandora.be/soulmaniacs/
winrar.html
6
Data about the world around us Is the physical world around us smooth
and continuous? In the microscopic level, materials are
all make up of molecules and atoms, energies are all in units of quanta.
A smooth and continuous physical world is an illusion due to our limited senses.
7
Analogue Data: an example Analogue: something that is analogous
or similar to something else (Webster) Analogue Data: The use of continuously
changing quantities to represent data. A mercury thermometer is an analogue
device. The mercury rises and falls in a continuous flow in the tube in direct proportion to the temperature.
The mathematical idealization of this smooth change as a continuous function leads to “Analogue Data”, an infinite amount of data
8
From Analogue to Digital data Data can be represented in one of two
ways: analogue or digital:Analogue data: A continuous representation (using mathematical function or smooth curve) , analogous to the actual information it represents
Digital data: A discrete representation, breaking the information up into separate elements (data)
9
Digital data in computer Computer components are discrete in
nature Computer memory and other
hardware (e.g. cpu) have only finite room to store and manipulate data
The goal is to represent enough of the world to satisfy our computational needs and our senses of sight and sound
10
Digitized Information Computers, cannot deal with analogue
information So we digitize information by
breaking it into pieces and representing those pieces separately
Why do we use binary? Modern computers are designed to use
and manage binary values because the devices that store and manage the data are far less expensive and far more reliable if they only have to represent one of two possible values
11
Electronic Signals An analogue signal continually fluctuates
in voltage up and down A digital signal has only a high or low
state, corresponding to the two binary digits
All electronic signals (both analogue and digital) degrade as they move down a line
The voltage of the signal fluctuates due to environmental effects
12
Analogue and Digital Information Periodically, a digital signal is reclocked
to regain its original shape
An analogue and a digital signal
Degradation of analogue and digital signals
13
Binary Representation One bit can be either 0 or 1 Therefore, one bit can represent only
two things To represent more than two things, we
need multiple bits Two bits can represent four things
because there are four combinations of 0 and 1 that can be made from two bits: 00, 01, 10, 11
15
Binary Representation In general, n bits can represent 2n things
because there are 2n combinations of 0 and 1 that can be made from n bits
Note that every time we increase the number of bits by 1, we double the number of things we can represent
Questions: How many bits are needed to represent 128 things? How many bits are needed to represent 67 things?
16
Representing Negative Values You have used the signed-magnitude
representation of numbers before The sign represents the ordering/direction The digits represent the magnitude of the
number
17
Representing Negative Values Problems with the sign-magnitude
representation There are two representations of zero
(plus zero and minus zero, +0 and -0) which can cause unnecessary complexity
Problem to represent the negative sign If we allow only a fixed number of values
(stored in n-bits), we can represent numbers as just integer values, where half of them represent negative numbers
18
Representing Negative Values For example, if the maximum number of
decimal digits we can represent is two, we can let 1 through 49 be the positive numbers 1 through 49 and let 50 through 99 represent the negative numbers -50 through -1
This representation of negative numbers is called the ten’s complement
19
Advantages of Using 10’s Complement To perform addition within this scheme,
you just add the numbers together and discard any carry
20
Advantages of Using 10’s Complement A-B=A+(-B). We can subtract one
number from another by adding the negative of the second to the first
Addition and subtraction become the same
22
Overflow Overflow occurs when the value that we
compute cannot fit into the number of bits we have allocated for the result
For example, if each value is stored using eight bits, adding 127 to 3 causes overflow
Overflow is a classic example of the type of problems we encounter by mapping an infinite world onto a finite machine
24
Representing Text A text document can be decomposed into
chapters, paragraphs, sentences, words, and ultimately individual characters
To represent a text document in digital form, we simply need to be able to represent every character that may appear In English, “a, b, …, z, A, B,…Z”
The general approach for representing characters is to list them all and assign each a binary string ‘a’ (01100001)2 (97)10 61h
25
Character Set A character set is a list of
characters and the codes used to represent them
By agreeing to use a particular character set, computer manufacturers have made the processing of text data easier
ASCII, Unicode, etc.
26
ASCII ASCII stands for American Standard
Code for Information Interchange The ASCII character set originally
used seven bits to represent each character, allowing for 128 unique characters
Later ASCII evolved so that all eight bits were used which allows for 256 characters
28
ASCII Note that the first 32 characters
in the ASCII character chart do not have a simple character representation that you could print to the screen (unprintable)
29
Unicode Character Set Extended version of the ASCII character
set is not enough for international use The Unicode character set uses 16 bits per
character Therefore, the Unicode character set can
represent 216, or over 65 thousand, characters Unicode was designed to be a superset of
ASCII The first 256 characters in the Unicode
character set correspond exactly to the extended ASCII character set
31
Representing Audio Information We perceive sound when a series of air
compressions vibrate a membrane in our ear, which sends signals to our brain
A stereo sends an electrical signal to a speaker to produce sound
This signal is an analogue representation of the sound wave
The voltage in the signal varies in direct proportion to the sound wave
32
To digitize the signal we periodically measure the voltage of the signal and record the appropriate numeric value A process called sampling
In general, a sampling rate of around 40,000 times per second is enough to create a reasonable sound reproduction
Representing Audio Information
34
• A compact disk (CD) stores audio information digitally • On the surface of the CD are microscopic pits that represent Binary digits•A low intensity laser is pointed as the disc•The laser light reflects strongly if the surface is smooth and reflects poorly if the surface is pitted
Representing Audio Information
35
Audio Formats WAV, AU, AIFF, VQF, and MP3
MP3 is dominant MP3 is short for MPEG (Moving Picture Experts
Group) audio layer 3 file MP3 employs both lossy and lossless compression First it analyzes the frequency spread and
compares it to mathematical models of human psychoacoustics (the study of the interrelation between the ear and the brain), then it discards information that can’t be heard by humans
Then the bit stream is compressed to achieve additional compression
Representing Audio Information
36
Representing Colour Colour is our perception of the
various frequencies of light that reach the retinas of our eyes
Our retinas have three types of colour photoreceptor cone cells that respond to different sets of frequencies
These photoreceptor categories correspond to the colours of red, green, and blue
37
Representing Colour Color is often expressed in a computer
as an RGB (red-green-blue) value, which is actually three numbers that indicate the relative contribution of each of these three primary colours
For example, an RGB value of (255, 255, 0) maximizes the contribution of red and green, and minimizes the contribution of blue, which results in a bright yellow
39
Representing Images and Graphics The amount of data that is used to
represent a colour is called the colour depth
HiColour is a term that indicates a 16-bit color depth Five bits are used for each number in an RGB
value and the extra bit is sometimes used to represent transparency
TrueColour indicates a 24-bit colour depth Each number in an RGB value gets eight bits
40
Indexed Color• A particular application such as a browser
may support only a certain number of
specific colors, creating a palette from
which to choose.
• For example:
41
Digitized Images and Graphics Digitizing a picture is the act of representing
it as a collection of individual dots called pixels
The number of pixels used to represent a picture is called the resolution
Storage of image information on a pixel-by-pixel basis is called a raster-graphics format
Several popular raster file formats including bitmap (BMP), GIF, and JPEG
45
Representing Video Video codec (COmpressor/DECompressor)
refers to the methods used to shrink the size of a movie to allow it to be played on a computer or over a network
Almost all video codecs use lossy compression to minimize the huge amounts of data associated with video
The goal is not to lose information that affects the viewer's senses
47
Summary Distinguished between analogue and digital
information Explained data compression and compression ratios Examined the binary formats for negative values Described the characteristics of the ASCII and
Unicode character sets Explained the nature of sound and its representation Explained how RGB values define a colour Looked at representing Audio Information Looked at representing Images & Graphics Looked at representing Video Information
48
The links (for your website) to the glossary, PDF
(single) and PDF (2x2) are here: http://www.uic.edu.hk/~davetowey/teaching/CS/it1010/lectures/3.Glossary.pdf http://www.uic.edu.hk/~davetowey/teaching/CS/it1010/lectures/3.Data.Representation.pdf http://www.uic.edu.hk/~davetowey/teaching/CS/it1010/lectures/2x2_3.Data.Representation.pdf