introduction to computer networks 2004, 劉震昌. review of lab#2 and homework#1 “ lab ” means...

49
Introduction to Computer Networks 2004, 劉劉劉

Post on 19-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Introduction to Computer Networks

2004, 劉震昌

Review of Lab#2 and Homework#1

“Lab” means “Laboratory”, not “Label”. Algorithm steps must be executed in

turn. You can not skip any step on your own decision. Why?

Please write your homework subject correctly

No delay for homework

Outline Origins of the Internet 網際網路的發源 Origins of the WWW (World Wide Web)

HTML (Hypertext Markup Language 超文件標示語言 ?) guide

Searching the Web Search engine (Web browser 網路瀏覽器 ) Web directories

Origins of the Internet

Ref: Chap.2 on Comer’s book

Origins of the Internet

In 1969, US DoD’s ARPA(Advanced Research Projects Agency) built the ARPANET Only 4 nodes De-centralized system Data transmission 參考網站

Origins of the Internet (cont.)

1974, TCP/IP was developed and later became a standard in 1983 TCP(Transmission Control Protocol) IP(Internet Protocol) 網路通訊協定的重要性

Growth of ARPANET --> Internet Internetworking No organization owns or controls it

no. of computers

Growth of the Internet

1M = 1,000,000

計量單位 http://www.spes.tpc.edu.tw/handouts/B_Basic/ref.

htm

log scale

Almost exponential growth

Recently ignited by WWW and economical activities

指數成長

IP Service

Where is your computer on Internet ? Current internet (IPv4)

32 bits to represent an IP address Ex. 163.22.20.129 What is your computer’s IP address? ipconfig

163.22.20.129

163.22.20.118

163.22.22.119

Address Resolution Protocol (ARP)

IP protocol address is an abstraction; physical network hardware does not know how to locate the computer from IP address

Techniques table look-up closed form computation message exchange

Computers on the Net

Every Internet host has a unique IP address, however, it is hard to remember. So we have host name e.g., arbor.watson.ibm.com is 9.2.13.20 and ar

bor.ee.ntu.edu.tw is 140.112.21.236 Try: nslookup

Domain Name Server 網域名稱伺服器

Host name is to be converted into IP address

Domain Name Servers (DNS) containing a database (look-up table) for host

name to IP address mapping there are many domain name servers “.com”, “.gov”, “.edu”, “.tw”

Lab#3 Use the commands

ipconfig nslookup

Internet application telnet: A terminal emulation program

for TCP/IP networks such as the Internet

ftp (file transfer protocol)

telnet163.22.22.119

163.22.22.119(Run telnet server)

Origins of WWW

Ref: Chap. 32 on Comer’s book

Outline Origins of WWW(World Wide Web) Web browser HTML(Hyper-Text Markup Language) HTTP(Hyper-Text Transfer Protocol)

Origins of WWW

World Wide Web(WWW) Proposed in 1989, by Tim Berners-Lee at

CERN(European Particle Research Center) A large-scale, online repository of

information Develops interoperable technologies

(specifications, guidelines, software, and tools)

Currently, there is a W3C (WWW consortium) doing these things

Origins of WWW (cont.) Data format: HTML (HyperText Markup L

anguage) Allow hypertext link (URL: Universal Resource

Locator) to other documents on Web

Protocol: HTTP (HyperText Transfer Protocol)

Data exchange standard on Web 資料交換的共通格式與傳輸協定

Protocol://computer_name:port/document_name

Origins of WWW (cont.)

Internet

URLsWWW

就像一個大的資料庫分佈在 Internet 上

Web browser tools to read HTML document

Web browser Web server(ex. 跑 IIS)

client server

click a link send requestfind document

return HTML documentdisplay

Connection terminated after receiving all items

Web browser (cont.) Text mode browser: lynx

lynx http://www.csie.ncnu.edu.tw Graphics mode browser

NCSA(National Center for Supercomputing Applications) Mosaic by Marc Andreeson

Netscape IE

Web browser (cont.)Browser architecture

Document representation Hypertext: textual information Hypermedia: additional info., like images a

nd graphics HyperXXXX: an abstract idea

A set of documents, and a document can contain pointers to other documents

Page: a hypermedia document on the Web

Hypertext Markup Language (HTML)

Markup Language: publishing hypertext in a less detailed format

HTMLdocument

display resultsmay be different

HTML Text file + tags Tags: formatting the document <Tagname>…text…</Tagname>

HTML layout

<HTML> <HEAD> <TITLE> ….title of the text…. </TITLE> </HEAD> <BODY> …body of the document… </BODY></HTML>

* 良好的縮排便於人類理解編輯

HTML layout (cont.)

<HTML><HEAD><TITLE>….title of the text….</TITLE></HEAD><BODY>…body of the document…</BODY></HTML>

HTML examples Example1 Example2 Example3: embedding images Example4: hypertext link(anchor 錨 )

<a> ….anything…</a> Any item can have a hypertext link

Lab#4 in the afternoon http://www.csie.nctu.edu.tw/~jglee/teacher/content.

htm

HTTP documents See http://ftp.ics.uci.edu/pub/ietf/http/ HTTP/1.0, RFC 1945, 1996 HTTP/1.1, RFC 2068, 1997

Searching the Web

Ref: Chapter 13 in “Modern Information Retrieval”

Ricardo Baeza-Yates and Berthier Ribeiro-Neto

Outline Measuring the Web Methods for searching the Web

Search engines Web directories

Searching the Web WWW starts in 1989 Just the textual data is estimated to be

in the order of one terabyte Goal: how to efficiently manage,

retrieve and filter information from the Web?

Challenges Distributed data

Data spans over many computers interconnected without predefined topology

High percentage of volatile data 易變資料 40% of the Web changes every month

Large volume Unstructured and redundant data 重複資料

30% of Web pages are (near) duplicates Heterogeneous data

Different languages

Measuring the Web

Internet

URLsWWW

Webserver

*1998, 3M servers

No. of servers =1/10 no. of computers on Internet

3 百萬

Measuring the Web (cont.) 1998 5Kb per Web page on average 300M Web pages (3 億… ) 300M * 5Kb = 1.5 Terabytes Grow at a rate of 20M pages per month

Growth of the Web

1996 1997 1998

100

200

300

Webpages Web

sites

Million

year

Methods for searching the Web

Search engines 搜尋引擎 Index the Web documents as a full-text d

atabase Alta Vista, Google, …

Web directories 入門網站目錄 Classify selected Web documents by subj

ect Yahoo!

Search engines搜尋引擎

Model the Web as a database All queries must be answered without

accessing the Web pages

Userqueries database

Search engines (cont.) AltaVista (www.altavista.com)

20 multi-processor machines 130 Gb of RAM each Over 500 Gb of disk space each 75% resources on the query engine

The top search engines Foreign

Google ( www.google.com ) www.yahoo.com www.altavista.com Inktomi ( www.inktomi.com ) Statistics on search engines

www.searchenginewatch.com http://imt.net/~notess/search

Taiwan Yahoo!/Kimo uses google Openfind ( www.openfind.com.tw )( 中正大學吳昇教授 ) Yam ( www.yam.com.tw )

Search engines (cont.) Centralized crawler-indexer

architecture

UserInterface

QueryEngine

Indexdatabase

users

Indexer

Crawler

Web

User Interface

Query interface Keywords Boolean operator

Answer interface Rank the searched pages

Statistics about the term occurrence within the document

Popularity Hyperlink information

UserInterface

QueryEngine

Indexdatabase

users

Indexer

Crawler

Web

Crawler Robots, spiders( 蜘蛛 ), wanderers, wal

kers, and knowbots Inspite of their name, the crawler runs

on a local system and sends requests to remote Web servers

Method: start with a set of URLs, and from there extract other URLs

Crawler (cont.)

How the Web is traversed, the index of a search engine can be thought as analogous to the stars in a sky Invalid links in search engines vary from

2% to 9% The current fastest crawlers are able

to traverse up to 10M Web pages per day 300M/10M = 30 days

Web directories 網站目錄 Classify the Web pages by categories Directories are hierarchical taxonomies

that classify human knowledge Yahoo! has close to 1M pages classified How to classify pages?

Pages has to submitted to the Web directories

Manually done by few people Automatic classification is not yet mature Not every page is classified

Some Web directories

Web directories URL Web sites(K) Categories

Yahoo! www.yahoo.com 750LookSmart www.looksmart.com 300 24Lycos Subjects a2z.lycos.com 50eBLAST www.eblast.com 125NewHoo www.newhoo.com 100 23Magellan www.mckinley.com 60Netscape www.netscape.com Snap www.snap.com

The power of search engine

I have found a homepage that contains the solutions to the C textbook!!!

Who find the homepage and sends me email first will get a bonus point…