laying the foundation mining the web fr. jomar legaspi

12
Laying the Foundation Mining the Web Fr. Jomar Legaspi

Upload: claire-carson

Post on 05-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Laying the Foundation

Mining the Web

Fr. Jomar Legaspi

Page 2: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Learning Milestones

• Internet / World Wide Web

• Search Engines

• Fundamentals of Search Mathematics

• Search Strategies

• Evaluating Web Resources

• Citing Web Resources

• Web Search Exercise

Page 3: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Internet • How it happened?

– need to connect scientists / experts from diverse locations to fast track space exploration project – ARPANET

– explosive growth – browser • How big is the Internet?

– approximate – 40 million networks– 200 M users connected to it – 5 M websites – quadruples by 2005– 1 billion web documents (IDC – Internet Data Corporation –

1998)• Internet revolution:

– democratization of information– convergence of technology

Page 4: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Search Engines – Mining the Internet

• Individual Search Engines: compile their own searchable databases– Index words or terms in web based documents– Directories – classify web documents or

locations in arbitrary classifications or taxonomy

• e.g. Yahoo, Google, Altavista

• Metasearch engines – gateway to databases from multiple search engines– Advantages: fast, more relevant but not that

comprehensive vs individual search engines• e. g. Metacrawler

Page 5: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Mining the Internet – Search Engines

• Subject Directories– maintained by human editors rather than by spiders or web robots– Types:

• General • Academic • Commercial• Portals = Gateway• Vortals – subject specific

– Strengths and weaknesses• Cumbersome – process entails going through several layers of categories / steps• High quality content – less instances of out of context search results• Active links

– When to use:• General search / general topic

– Examples:• Yahoo• LookSmart• Magellan

Page 6: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Mining the Internet – Search Engines

• Gateways and Vortals– Gateways / portals: collection of databases and

information websites categorized by subjects assembled, reviewed, recommended by content specialists or experts. Excellent for academic research

• Internet Public Library: www.ipl.org• Argus Clearinghouse: www.clearinghouse.net• WWW Virtual Library: www.vlib.org

– Vortals (vertical portals) – dedicated to a single subject • Eric Clearinghouse: http://www.eric.ed.gov. • The Big Hub: www.thebighub.com• Complete Planet: www.completeplanet.com

Page 7: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Mining the Internet – Search Engines

• Deep Web or the “Invisible Web” – approximately 60% - 80% of the web

remains invisible to search spiders / robots.

– Information in secured private networks / databases

– Gateways and vortals = the best way to gain access and exploit the Deep Web / Invisible Web

Page 8: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Mathematics of Search Engines

• Use + or – signs before a keyword to force their inclusion / inclusion in the search.

• “” – keywords are searched in exact order / sequence– “information technology strategies”

• Combination of all the symbols– “information technology strategies”-business-

government +schools

Page 9: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Search Strategies

• Articulate what you need to search. Formulate the key concepts as specific at they could be.

• Critical success factor: KEYWORDS • Keywords = use NOUNS / OBJECTS rather than

verbs and adjectives• Avoid use of propositions, conjunctions, or

common verbs – most search engines will disregard them

• Most powerful keywords = “phrase”

Page 10: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Separating diamonds from dirt…• Tool – CARS by Robert Harris• Credibility

– Trustworthiness of the author = authority and credibility • Author’s name• Qualification• Affiliations• Publisher / Sponsor• Address, tel. Nos.• Email address

• Accuracy– Objective, correct, up-to-date, comprehensive, exact. The information is

appropriate to the audience it was intended for.• Date of publication• Last date when the site was updated• Email address • Link to questions and comments

• Reasonableness– Balance, objectivity, and consistent; tone of the language – moderate / absence

of motherhood statements / grandstanding– Watch out who is the sponsor

• Support– Sources of information / knowledge– Corroboration

• Citations of sources: bibliography

Page 11: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Resources• Ellen Chamberlain, Bare Bones 101: A Basic Tutorial on Searching

the Web, University of Southern California Beufort Library, http://www.sc.edu/beaufort/library/bones.html, January 2000, February 10, 2002

• Craig Branham, A Student’s Guide to Research in the WWW, St. Louis University, Illinois, http://www.slu.edu/departments/english/research/, March 27, 1997, February 10, 2002

• BrightPlanet Corp., Guide to Effective Searching of the Internet, http://www.brightplanet.com/deepcontent/tutorials/search/index.asp, 2000 – 2002, March 1, 2002

Page 12: Laying the Foundation Mining the Web Fr. Jomar Legaspi

Your school recently subscribed to the services of a local Internet Service Provider. Initially it was decided that Internet access will be available in the library where 15 computers were installed. Your school principal understood that the Internet can exponentially increase the number of learning resources available to the students which before where simply limited to print media. The principal wrote a memo asking all teachers to develop an online resource center as a way to assist students to search for quality information in the web.

Your task:1. define your audience2. define the subject area / content / discipline3. search the web for at least 10 online resources4. give a brief description of each site