계량서지학과 인용분석

Download 계량서지학과 인용분석

If you can't read please download the document

Upload: han-woo-park

Post on 02-Dec-2014

1.400 views

Category:

Education


5 download

DESCRIPTION

이 자료는 영남대 2013년 2학기 대학원 수업시간에 발표된 자료를 합침 언론정보학과 최성철,박지원, 김종섭, Xanat V. Meza, 이준영

TRANSCRIPT

  • 1. SCI ,, , Xanat V. Meza, 2013 2

2. 1 // 2 4 4 3. // 4. 5. 1885 6. , , , . 7. ? (Bibliometrics) (Scientometrics) (Informetrics) (Webometrics) (Netometrics) (Cybermetrics) 8. ? 1960 (Alan Pritchard) 9. ? ... . 10. ? 11. ? , , , 12. : (17) : 13. : 14. : (P. Gross) (E. M. Gross) 3,633 15. : (Paul Otlet) 16. , 17. SCI 1961 18. () ,, (COLLNET) ( ) 19. 20. SCI(SCIENCE CITATION INDEX) (Eugene Garfield) 21. Shepards Citation (Stare decisis) 1860 (Shepards Citation) 22. Shepards Citation(vs SCI) Shepards CitationSCI() 23. SCI : (Document) , . 24. . 25. (Subject heading) . 26. . (,,) . 27. , . . 28. (Full-text retrieval system) 29. (Inverted file) (Key punching) 30. IBM (Hans Peter Luhn) 1. . 2. (Word significance) . 31. SMART Saltons Magic Automatic Retriever of Text) & (Cranfield) (phrase) , . 32. Encyclopedia Encyclopaedia . 33. , , . (parser) . Transformational parser 34. . 35. SCI : . 36. SCI 37. SCI 1940 (Roberto Busa) IBM . 38. SCI 39. 1. . 40. 2. (PSI)Permuterm Subject Index (KWPSI) Key words/Phrase Subject Index 41. SCI 3. , . 42. SCI , . 43. SCI SCI 1972 SSCI 1978 A&HCI Web of Science & Current Contents -> ISI Web of Knowledge 44. : SCI SCI , -> 45. : SCI ISI 46. : SCI , 47. : SCI (?) 48. : SCI . 49. : SCI 1. 2. 50. : SCI 3. 4. 51. : SCI 1960 (Myer Kessler) -> : (Gerard salton) & (Michael Lesk) -> SMART 52. : SCI (M.E.Stevens) (G.H.Urban) SADSACT (W.A.Gray) (A.J.Harley) MEDLARS 53. : SCI 1980 SCI 54. SCI 3 _ , , , , 2 55. 3.1 1937 Birkveck College of London . X- X-ray crystallography . , . , ( ) 56. 3.1 Engel . , , . , . , , . 57. 3.1 . , . The Social Function of Science(1939) 19 20 , . thermodynamics . 1939 2 Red Science 58. 3.1 , , . 1940 . 59. 3.1 60. 3.1 20 , reprint . (SDI)Selective Dissemination of Information . SCI SCI . SCI , SCI . 61. 3.2 : 1942 . - , , ( ) ( ) ( ) (, . ) 62. 3.2 : . 63. 3.2 : , , , . . . . . 64. 3.2 : Matthew Effect" Jonathan Cole Stephen Cole . . ( ) Manfred Bonitz . . 65. 3.2 : , . , citation" . , SCI . . , ( ) .( )- . 66. 3.3 : 1955 Citation Indexes for Science" . . . . 67. 3.3 : . . . . 68. 3.3 : 1970 ISI . . . , . , . 69. 3.3 : Edmund Leach . [ Culture and Communication](1976) . , . Noam Chomsky . 70. 3.3 : , , , , , , , , , .... . "Signs" Symbols" (). , . a-p-p-l-e" . . 71. 3.3 : . . ---- .( , )-> .( ) , . standard symbols( ) 72. 3.4 : ? ?... .... . - [ , ] 73. 3.4 : 1959 , - . bibliometrics reductionism" . 1. . . . 2. ,, . . 74. 3.4 : [Philosophical Transactions of the Royal Society of London] 1961 , . , , , , . Square-Root Law , . 10 . 20 . , . , , . 75. 3.4 : , . . . Donald rquhart 1956 Science Museum SCI SCI . Scale-free network . 76. 3.4 : immediacy factor" . 1970 , , , Price's Index" , 5 . ( )invisible college" . , . 77. 3.4 : map of science . , . , ,, , . 78. 79. 1. . 1 80. 2. . 81. 3. . , . 82. : , , . . 83. : 1) 2) 3) 84. : - , (Power law) 85. : 86. : 87. : (Carl Friedrich Gauss) (Pierre Simon Laplace) . 88. : (Law of error) . 89. . . . (Pearson) r -1~1 . 90. . , 3 . . 91. . . . 92. , , . 93. (Stephen Bensman) . (probability tail) 94. (Vilfredo Pareto) 80/20 20% 80% , 95. , (;extreme value) (Outlier) 96. : . 97. : (Spearman) (Kendall ) . 98. : , , . . 99. t . (negative binominal distribution), (Waring distribution), (GIGPD, generalized inverseGaussian-Poisson distribution) 100. . . (stochastic birth) . 101. . (George Polya) (Florian Eggenberger) (Um model) (?) - (CAD, Cumulative Advantage Distribution) 102. . 100 60 100 15 100 7 . 103. (William Potter) 1981 . . 104. , . . 105. 1. 9 , 429 . 2. 59 , 499 . 3. 258 , 404 . 106. 1. 9 2. 9 x 5 (45 59 ) 3. 9 x 5 (225 258 ) 107. 9 : 9 x 5 : 9 x 5 -> 1 : m : m : M (multiplier) 108. 109. (Groos Drop) (Liwen Qiu) , 110. (Gini Index) (Pratt Index). 111. 1900 . 1977 (Allan Pratt) . (Lorenz curve) . 112. . 0 1 0 1 113. . . . 114. (Shannon) (Bruce Hill) (Bose-Einstein) .. 115. . . . 116. . . 117. (Obsolescence) (aging) . . 118. . (aging factor) . 119. 1. ( ) 2. ( ) 120. ; 1. : . 2. . 121. ; 3. 30 . 122. ; 4. . . 5. . 123. (2005) (Power Laws in the Information Production Process) 124. . 1. f . n=1,2,, n f(n) . f(n) n . 125. 2. f () , . f(n) = C/n 126. 1970 (Fourier) (Aurel Avramescu) . 127. 1990 (georgi Stankov) (Ludmila Ivancheva) 128. (Michel Caollon) (Bruno Latour) - (Rafael Bailon Moreno) 129. / 130. (resilience to ambiguity) . 131. IijCij min( ci, cj ) 132. PijCij xN CiCj 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. Chapter 6 Impact Factor and the Evaluation of Scientists: Bibliographic Citations at t he Service of Science Policyby Xanat V. Meza and Management Presentation for the New Media class 152. Introduction Scientific progress rests on the recognition and legitimization of individual contribution s by a research community sharing notion s, methodologies, practices and values. Scientific quality: the assessment of the no velty of a contribution through the evaluati on of scientific publications. 153. Introduction Peer reviewing: examination of scientific publi cations content by a group of acknowledged experts. Born in XVIIth century Europe against knowle dge fragmentation caused by scientific specia lties. The evaluation should be: Encoded. Impersonal. Convergent of judgment criteria from different exp erts. 154. Introduction Kuhn: there is a resistance to change, but co nservatism is also the staple of scientific chan ge. Pg. 182 From 1970s the use of citation analysis attra cted the attention of politicians and science m anagers, as it is convenient, quickly understo od, easily applied, and easy to calculate than ks to ISI databases. The problem is with unpublished material, whi ch has to rely on qualitative judgment of a loc al character. 155. Introduction Scientometricians have adopted a dual str ategy to make sure that quality equals citat ion impact: 1. Why the medicine is supposed to work: stat istical evidence points to the fact that excellen t scientists often publish citation classics. 2. How to avoid rejections: by anticipating the ambiguous role of self-citations and delayed r ecognition. 156. Introduction On self-citation: Tagliacozzo (1997). They ar e not an evil in themselves, but they are susp ected of deceitfully inflating the citation impac t of the unit under assessment. Nevertheless, they can be used as an impact-reinforcing me chanism by triggering a chain reaction. Another problem might be the historical recur rence of premature discoveries (pg. 185). 157. The shortcut: The Journal Impact Factor The Impact Factor (IF) is a journal citation me asure devised in the early 60s by Garfield an d Sher for Current Contents and the SCI. It is an estimation of a journals average articl es citation score over a relatively short time s pan. It is computed for a given year through a divis ion between a numerator and a denominator. 158. The shortcut: The Journal Impact Factor The numerator is the number of citations r eceived in the processing year by the item s published in that journal during the previ ous two years. The denominator is the overall number of citable items (research articles, reviews, a nd notes) issued by the journal during the same two years. X is the name of the journal. IF(X) = 100 + 150 = 3.57 70 159. The shortcut: The Journal Impact Factor The proportional increase in the IF score bec ame a prelude to marketing success, to an in creased commercial and symbolic visibility, a nd occasionally to a more profitable sale of a dvertising spaces. Arguments against the IF: Skewness of citation distribution: the poor statistical correl ation existing between the citedness of individual articles a nd the IF of the journals wherein they appeared is well doc umented. The several conceptual and technical limitations bearing u pon the significance of the IF. 160. The shortcut: The Journal Impact Factor Concerns about the stability and reliability of cita tion rankings were expressed in 1970 within the American and the European (specifically Dutch) bibliometrics subculture. Narins Evaluative Bibliometrics and Dennis Diek s and Hans Changs 1976 paper (pg. 189). Nancy Geller estimated the lifetime citation rate of a paper under a series of suppositions about t he regularity of citation patterns and the growth r ate of scientific literature. 161. The shortcut: The Journal Impact Factor Allison attempted to provide a scale-invariant me asure of inequality in publication/citation counts i n 1980. Schubert and Glanzel designed a reliability test for determining the statistical significa nce of the observed differences in citation i mpact in 1983. Bensman performed an exploratory investi gation of the 2005 JCRs probabilistic struc ture (pg. 190). 162. The shortcut: The Journal Impact Factor The confirmations are: Total citation count and IF capture different facets of j ournal importance, The former is better than the latter as a global measur e of importance, but the gap narrows if only a better cl assification is introduced in the sample sorting journal s from research journals. Both measures are surprisingly stable over time at the higher level of citation rankings. 163. The shortcut: The Journal Impact Factor The objections to the IF: Classification of citable items: the number of citable items does not take into account letters, editorials or conference abstracts. Accuracy issues: journals are complex entities that can ch ange, split, merge, etc. The ISI doesnt combine citation da ta on the basis of lineage, nor for sections of the same jour nal (pg. 192). Density and age of cited references. The more one cites, t he more can be cited. Density and age of cited references emphasize the variability of citation cultures among discipli nes and research fields. The Journal format and article type: the speed and intensit y with which different types of articles attract citations affec ts the IF. 164. Fixing the accuracy of the IF Modification of the time window for either the cited o r the citing years. Creation of a normalized measure, taking into accou nt the (sub)field citation practices, types of documen ts published by the journal, and age of cited papers: Graeme Hirst measured the number of times a journal is ci ted by the core literature of a single subfield in the 1970s. Pinski and Narin made a Google-like algorithm for journal r anking. The Journal to Field Impact Score introduced by van Leeu wen and Moed. 165. Fixing the accuracy of the IF: The Journal to Field Impact Score It counts the same items than the IF both at the num erator and denominator. It is field-specific, in the sense that the impact of the individual journal is compared to the world citation a verage in the corresponding research fields. It differentiates the normalized impact for the various document types. It employs variable citation and publication windows for the count depending on the communicative patte rns of the research field under evaluation. 166. 6.2 Design and application of adva nced scientometric indicators In order to improve the ISI, Leiden bibliometricians a t the CWTS completed a pilot study in 2006. They e xpanded the ISI indexes with source papers from ref ereed proceedings of computer science international conferences in view of developing field-specific bibli ometric indicators. Grant Lewison envisioned the expansion of analytic al tools necessary to trace the routes along which bi omedical research influences health decisions. This includes patents, clinical guidelines and newspapers . 167. 6.2 Design and application of adva nced scientometric indicators The opponents of quantitative methods get rid of bib liometric indicators because nothing appears as relia ble as an accurate peer review. But some place so much trust in quantitative analysi s that claim that properly weighted indicators should be implemented by expert systems and computer-as sisted procedures to help determine career progress ion and university chair assignment. Bibliometricians recognize the importance of peer re view and the implementation of additional, less subj ective analytical tools. 168. 6.2.1 Devaluation of Individual Scientists: From Cit ation Counting to the Hirsch Index Rarity is a structural property of the citation netw ork. After the launching of the SCI, statistical surveys revealed that the ratio between references proce ssed each year and the number of unique items cited by those references was nearly constant a nd approximately equal to 1.7 (the Garfields con stant). It means that in a single year, each paper was cit ed on average only 1.7 times and 25% of the pa pers were never cited. 169. 6.2.1 Devaluation of Individual Scientists: From Cit ation Counting to the Hirsch Index Citation data are not intended to replace informe d peer review and, to be correctly interpreted, ou ght to be adjusted by taking into consideration th e wide variability of citation practices across rese arch fields and disciplines. Examination of the content and context of citatio ns is also required. 170. 6.2.1 Devaluation of Individual Scientists: From Cit ation Counting to the Hirsch Index Normalization is usually attained by relating the citedness of a set of papers to a conventional st andard that may be either relative or absolute: A relative standard is the citation score of a cont rol group of papers allegedly similar to those un der evaluation. Co-citation analysis or bibliograp hic coupling can be used. An absolute standard is the expected number of citations per paper in the research (sub)field enc ompassing the papers under scrutiny. 171. 6.2.1 Devaluation of Individual Scientists: From Cit ation Counting to the Hirsch Index Schubert and Braun introduced a relative citatio n rate indicator for papers published in the same journal that relates the number of citations actual ly settled on them to the mean citation rate of all papers appearing in that journal. H-index was proposed by Jorge Hirsch in 2005. It means to provide a joint characterization of bot h productivity and cumulative research impact. 172. 6.2.1 Devaluation of Individual Scientists: From Cit ation Counting to the Hirsch Index A scientist has index h if h of the papers he or she h as (co)authored have at least h citations each, while the rest have fewer than h citations each. The subset of medium-highly cited papers bearing o n the calculation of the h has been dubbed h core by Rousseau. The author of many low-cited papers will get as wea k an h-index as the one who publishes only a few bl ockbusters. The Journal of Informetrics devoted a special issue t o h-type indexes in 2007. 173. 6.2.1 Devaluation of Individual Scientists: From Cit ation Counting to the Hirsch Index But! H-values cannot exceed the number of a scientists p ublications and dont decrease for those who give up publishing or dont get citations from a certain point o n. Thats why Hirchs seminal paper also suggested divi ding h by the years of academic activity. It overlooks publication type and age, citation age, sel f-citation rate, and number of coauthors. Variations and corrections to the h-index: pg. 20 4. 174. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups Bibliometricians believe that citation analysis applied to t he corpus of publications produced over a certain period of time by the members of a collective entity deals with a number of items large enough to allow a fairly safe applic ation of standard statistical tools. 175. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups 176. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups Issues: comparing output and impact data of resear ch organizations with sharply dissimilar organization al profiles, missions, managerial culture, financial re sources, and research facilities. Cross-country and cross-field comparison. Systematic errors: Limitation of citation indexing to the first author in the case of multiauthored papers The decision to not provide unified citation counts for journ als undergoing complex editorial changes The criteria applied to the selection of source journals 177. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups They also dont pay too much attention to non-A nglo-American journals, non-English-language jo urnals, and nonjournal materials. Since 1907 local databases and in-house softwa re for storing and processing ISI citation data ha s been created on these regards: National Science Foundations Science Literature Indi cators Database ISSRU at the Library of the Hungarian Academy of Sc iences CWTS at Leiden University. 178. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups The most authoritative research on how field-sp ecific and reliable bibliometric measures ought t o be defined: Googling citation networks: Pinski and Narin. T he influence methodology introduced a journal r anking algorithm inspired by the basic principle o f social networking: citations are not all equal, th eir weight being adjustable as a function of the p restige of the citers. 179. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups The influence weight is a size-dependent measu re of the weighted number of citations a journal r eceives normalized by the number of references it gives to other journals. The influence per publication for a journal is the weighted number of citations each of its articles r eceives from other journals. Googles pagerank, Eigenfactor algorithm, recen tly proposed variants of the IF. 180. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups Big science bibliometrics: Martin and Irvine. The met hodology of converging partial indicators appeals to combine several bibliometric and non-bibliometric in dicators, including publication counts, citation analys is, and an extensive for of peer review fed by direct i nterviews with scientists. It is relative and comparati ve. The Hungarian way. Scientometricians devised a set of relative indicators of publication output and citatio n impact that allow cross-field comparisons among c ountries, research institutes, departments and scient ific societies in a mathematical fashion. 181. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups The Hungarian way. Indicators for countries: Activity Index (AI): it is the ratio between the cou ntrys share in the worlds publication output in th e field and the countrys share in the worlds publ ication output in all science fields combined. Attractivity Index (AAI): it is the ratio between the countrys share in citations attracted y publicatio ns in the field and the countrys share in citations attracted by publications in all science fields com bined. 182. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups The Hungarian way. Indicators for countries: Activity Relative Citation Rate (RCR): the ratio b etween a summation of observed values and a s ummation of expected values for all the papers p ublished by a country in a given research field: RCR = Sum Observed citation rate Sum Expected Citation rate 183. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups The Leiden School. Their methodology disregard s the analysis at the macro-level of the country, c harged with being too generic to characterize re search performance in a politically relevant fashi on, and traces the roots of scientific excellence t o the university and its operative units. The hallmark of scientific interest is based on pu blishing and highly cited papers. 184. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups The world average is the ratio between the aver age number of citations per publication (correcte d for self-citations) and a field-specific world aver age based on the citation rate of all papers appe aring in all journals belonging to the same field i n which the unit under evaluation has been publi shing. After a series of papers by Martin and Irvine on t he 1980s regarding the decline of British scienc e, other authors resorted to alternative versions of ISI databases. 185. 6.2.2 Evaluations of Countries, Institutions, a nd Research Groups Leydesdorff found a relative stability followed by a re markable increase on British science. Braun, Glanzel and Schubert argued that there were only random fluctuations. The issue was on how to handle the raw data: Fixed journal set or dynamic set use. Computing annual publication totals on the basis of tape-y ears, the date a publication entered the SCI, or the publishi ng date. Limiting countable output to specific publication types. Adopting a fractional author count in the case of multiautho red papers. 186. 6.3 Citations of patents between science, te chnology, and law. The Internet and the World Wide Web introduction w ere sign of the emergence of a global and knowled ge-based economy. Business competition is exercised through the contr ol of natural resources, commodity markets, low-cos t manpower and deployment of investable intellectu al capital. Knowledge-drive innovation is now integral to comm ercial success Products stimulate innovation but inhibit diffusion thr ough intellectual property restrictions: patents. 187. 6.3 Citations of patents between science, te chnology, and law. Patent: legal document issued by the government. I n exchange for the public disclosure of the technical details of an invention, grants the inventor or any pe rson or organization to whom the inventors prerogat ives have been transferred, the monopoly on its pro duction and commercial exploitation. Since most inventions are built upon previous object s or techniques, the verification of patentability requi res an in-depth analysis of the inventions technical specifications by a skilled examiner. 188. 6.3 Citations of patents between science, te chnology, and law. A typical U.S. patent is composed of: A title page with bibliographic data and practical informatio n to identify the document unambiguously. The description of the invention explaining how to make an d use it. The claims defining the scope or boundaries of the patent. Only a small fraction of research output is patented. An invention should be novel, nontrivial and commer cially exploitable. 189. 6.3 Citations of patents between science, te chnology, and law. 190. 6.3 Citations of patents between science, te chnology, and law. Patents are though to manage mainly because of the ext ent of their contents dependence on scientific knowledg e, bearing on the basic issue of the relationships betwee n technology and science. We could reduce technology to applied science, but it is not that easy. Bibliometrics is asked to provide factual evidence to exte nd to technological documents the same analytical techn iques applied to scientific literature, both for quality asse ssment purposes and for mapping the formal connection s between scientific and technological research areas. 191. 6.3 Citations of patents between science, te chnology, and law. The Gross Domestic Product simply counts and classifies patents, but dont tell us the weight of each patents contribution to economic and tech nological advancement. Using citations as an aid to effective patent sear ches alternative (or complementary) to subject-b ased classification codes was circulating among American patent attorneys since the 1940s. 192. 6.3 Citations of patents between science, te chnology, and law. In 1957, Garfield tested a patent citation index to 4 0 00 chemical patents. The official version was published in the 1964 and 1 965 editions of the SCI, including all U.S. patents. It was dropped due to lack of financial support. Reisner tested a machine-readable citation index to patents as a tool for monitoring the performance of c lassification systems. It was found that if many patents were build upon an specific citation, this citation was a significant techn ological spillover. 193. 6.3 Citations of patents between science, te chnology, and law. The interest in patent citation analysis has flouris hed since the 1980s, when large-scale computa rized patent data became increasingly available for automatic processing. Narins team extended the core of bibliometric te chniques to technology indicator construction. Jaffe and Trajtenberg employed patent citations t o quantify the market value of patents and the flo ws of technological knowledge in the heart of ec onomic growth. 194. 6.3 Citations of patents between science, te chnology, and law. There are a wide range of indicators of technolo gical prominence and diffusion under design: Knowledge diffusion. Technology and science. Evaluation studies: Narin. Business intelligence: Narin. In high-tech and fast-moving areas, there is a str iking similarity between the referencing cycles of cientific articles and those of patents. 195. 6.3 Citations of patents between science, te chnology, and law. Patents have been found emmeshed in sale-free citation networks governed by a power law distri bution that imposes an uneven allocation of sym bolic wealth among units of supposedly different caliber. Patent references are the result of a social proce ss involving at least three actors: the inventor, th e attorney or agent, and the patent examiner. 196. 7 - 3 197. , , , ( ) : , 2006 12 (HEFCE) ( ) , CWTS() (citations per paper) (central quality index) ? ( ) 198. 7.1 : ? 1960~70 : -> : -> : 7.1.1 : ( ) : 20 ( : - ) (1960), (1981) 199. 7.1.1 : 26 . , 1980 - 1. 2. : : ( )( )3. 4. : : ( , )5. : ( , , _or)6. : ( . ) 200. 7.1.2 ( )- > * (1975) , (41% ) , . SCI < . . (, ,) 201. 7.1.3 . 1970 - , - - - - - -> 202. 7.1.3 * 1. : ( ) 2. : ( ) 3. : ( ) 4. : ( )5. : ( ) 203. 7.1.4 : (1984) , 1. - 2. - - 2001 (1983) 3 - , - 204. 7.1.4 : 13 (310P ) . , 205. 7.2 : (1972) 1. . 2. . 3. . , , (1951) 206. 7.2 : . : . 207. 7.3 30 (2000) COLLNET . ( ) () 18 208. 7.3 (1968) . . . 3 (322P ) . (1. , 2. , 3./ ) (ICMJE) (324P ) 209. MeasURING SCIENTIFIC COMMUNICATION IN THE TWENTIETH CENTURY: FROM BIBL IOMETRICS TO CYBERMETRICSNICOLA DE BELLIS PRESENTED BY XANAT V. MEZ A 210. Introduction The web exhibits a citation structure, links between web pages being similar to biblio graphic citations. Thanks to the markup languages, the infor mation units composing a text can be mar ked and made recognizable by a label that facilitates their automatic connection with t he full text of the cited document. 211. Introduction Disciplinary databases: Chemical Abstract Service (CAS) SAO/NASA Astrophysics Data System (ADS) SPIRES HEP database MathSciNet Citeseer Ieee Xploree Citebase Citations in Economics 212. Introduction Multidisciplinary databases: Web of Science Google scholar Scopus The relevance of a webpage to a user que ry can be estimated by looking at the link r ates and topology of the other pages pointi ng to it. 213. Introduction Pagerank: Googles ranking algorithm. It assigns different prestige scores to individual page s according to their position in the overall network. More weight is assigned to the pages receiving more l inks. An authority is a page that receives many links fr om quality hubs (like a citation classic). A quality hub is a page providing many links to a uthorities (like a good review paper). 214. citations in e-journals and open archives Advantages: The immediacy of scientific literature implied an in formation revolution. The web significantly helps to increase citation im pact, and local online usage became one of the b est predictors of future citations. Less gate-keeping. Disadvantages: Fewer distinct articles are cited more. Citations tend to concentrate on more recent publ ications. 215. citations in e-journals and open archives How to quantify the Web-wide cognitive and s ocial life of scientific literature? The impact of a set of documents outside the ISI circuit can be estimated by: Counting, by means of usage mining techniques, the number of document views or downloads over a certain period of time Interviewing a significant sample of readers Counting, by means of search engines facilities, t he number of links to the website hosting the doc uments 216. citations in e-journals and open archives Standards and protocols have been devel oped in the context of national and internat ional projects to make uniform the recordin g and reporting of online usage statistics: COUNTER (Counting Online Usage of Networ ked Electronic Resources) SUSHI (Standarized Usage Harvesting Initiati ve) MESUR (Metrics from Scholarly Usage of Res ources) 217. citations and open access Peer-reviewed open access journals appe ared in the 1980s, for example New Horiz ons in Adult Education, Psycholoquy, Post modern Culture and Surfaces. In the 1990s RePEc-Research Papers in Economics, Medline/PubMed Central and CogPrints were started or opened to public . In 1991 Ginsparg setted up arXiv, a prepri nt and postprint central repository initially o 218. citations and open access Under the slogan Public access to publicly funded research, the Open Acce ss movement has publish ed theoretical and busine ss models along with tech nical infrastructure, to sup port the free online disse mination of peer-reviewed 219. citations and open access There are two options for authors following this way of publication: Submit a paper directly to an OA journal. IT peer-reviews and makes freely available all of it s contents for all users while shifting editorial costs onto the author of the funding institution. There are over 3,200 OA journals in the Directory o f Open Access Journals (www.doaj.org) Keep publishing in traditional journals, but arc hive a peer-reviewed version of the same cont ent into an open accessible repository. 220. citations and open access A goal of the OA movements has been to demonstrate that open access substantiall y increases research impact: In 2001, Lawrence provided evidence that cita tion rates in a sample of computer science co nference articles appeared significantly correl ated with their level of accessibility. In 2007, Harnad and Brodys team has been d etecting OA citation advantage across all disci plines in a twelve-year sample of ISI articles ( 1992-2003). The citation impact was 25 to 25 221. citations and open access Counter-arguments: Subjectivity factor in the selection of postable i tems Increased visibility Readership Shelf-exposition Best authors tend to be overrepresented Self-selection bias postulate 222. citations and open access In 2007 a paper by Moed performs a citation analysis of papers posted to the arXivs cond ensed matter section before being published i n scientific journals and compares the results with those of a parallel citation analysis for un posted articles published in the same journals . Articles posted to the preprint server are actu ally more cited than unposted ones, but the ef fect varies with the papers age. The citation advantage of many OA papers fa 223. citations and open access Two studies on the citation impact of OA jour nals indexed on the Web of Science appeare d in 2004. The impact factor of ISA OA journa ls was lower than no OA journals. Despite the evidence, there are important rea sons to support OA journals: Shortening the paths between invisible colleges a nd turn them into real time collaboration network will increase the speed and effectiveness of scien tific communication. In the non-big research areas, it increases the op portunity of pursuing research goals. 224. citations and open access Harnad is proposing a multidimensional, fi eld-sensitive, and carefully validated open access scientometrics, taken advantage of open access materials. The key is Metadata: set of encoded data attached to information units processed by the automa tic indexing system to help identify, retriev e, and manage them in an effective fashio n. But there needs to be a metadata standar 225. Citebase, Citeseer: the road toward an open acces s citation index www.citebase.org is an indexing system of O A repositories. It was developed by Brodys te am in US in 2001. It uses the OAI-Protocol fo r Metadata Harvesting. The Citebase software parses the bibliograph ic references of the fulltext papers hosted by t he servers and, every time a reference match es the full text of another paper in the same r epository, it creates a link. A usage/Citation Impact Correlator produces a correlation table comparing the number of ti mes an article has been cited with the approx 226. Citebase, Citeseer: the road toward an open acces s citation index CiteSeer, formerly ResearchIndex (citesee r.ist.psu.edu), is a digital library search and management system developed in US. It gathers together research article preprint s and postprints from several distributed n odes of the open access Web through web crawling techniques. It extracts the context surrounding the citat ion in the body of the paper. 227. Citebase, Citeseer: the road toward an open acces s citation index The new Web Citation Index, based on CiteS eer technology, was launched officially in 200 5. It covers materials from OA repositories that meet quality criteria, such as: arXiv. The Caltech Collection of Open Digital Archives. The Australian National University Eprints Reposit ory. The NASA Langley Technical Library Digital Repo sitory. The open access content in Digital Commons. 228. the citation as hyperlink and the current trends in quantitative web studies. The probability of a webpage to be include d into a search engine database increases as the web crawler fetches other pages lin king to it. But! Links do not acknowledge intellectual debts. They lack peer review. Links are not indelible footprints in the landsc ape of recorded scholarly activity. 229. the citation as hyperlink and the current trends in quantitative web studies. Their study is divided in: 1. Complex network analysis, which investigat es the topological properties of the Internet an d the Web as particular cases of an evolving c omplex network. 2. Hyperlink network analysis, which interpret s the connections between websites as techn ological symbols of social ties among individu als, groups, organizations and nations. 3. Webometrics, which extends to the web sp ace concepts and methods originally develop 230. bibliometrics laws in the cyberworld: complex network analysis. The web topological structure, i.e. the num ber and distribution of links between the n odes, initially played the crucial role of und erstanding a wide range of issues: The way users surf the Web. The ease with which they gather information. The formation of Web communities as cluster s of highly interacting nodes. The spread of ideas, innovations, hacking atta cks, and computer viruses. 231. bibliometrics laws in the cyberworld: complex network analysis. Theoretical physicists have recently shifte d the attention to the dynamics of the struc ture by progressive addition or removal of nodes and links. The key role on the modeling exercise is t he graph: What kind of graph is the Web? What pattern, if any, is revealed by the hyperli nk distribution among the nodes? Do the links tend to be evenly distributed? 232. bibliometrics laws in the cyberworld: complex network analysis. In the late 1950s, when Erds and Renyi su pplied graph theory with a coherent proba bilistic foundation, the conviction gained gr ound that complex social and natural syste ms could be represented, in mathematical terms, by random graphs. Each node of a random graph has an equ al probability of acquiring a link, and the fr equency distribution of links among nodes is conveniently described by a probability 233. bibliometrics laws in the cyberworld: complex network analysis. In random graphs, there is a dominant averag e number of links per node called the network s scale. It is an upper threshold that preven ts the system from having nodes with a dispr oportionately higher number of links. Nodes are not clustered and display statistica lly short distances between each other. Empirical evidence seemed to contradict this model because the structure of complex netw orks was somewhere between a totally regula r graph and a random graph. 234. bibliometrics laws in the cyberworld: complex network analysis. In 1998, Watts and Strogartz set a m odel of complex networks using the s mall world. A small world is said to exist whenev er members of any large group are c onnected to each other through short chains of intermediate acquaintances . 235. bibliometrics laws in the cyberworld: complex network analysis. The path to small worlds: Pool and Kochen made mathematical descript ions of social contact based on statistical mec hanics methods, encompassing graph-theoret ic models and Monte Carlo simulations in the 1950s In 1967, Milgram initiated a series of experime nts to test the small world conjecture in real s ocial networks. He found that in average, the acquaintance chain required to connect two ra ndom individuals is composed of about six lin 236. bibliometrics laws in the cyberworld: complex network analysis. In 1967, Watts and Strogatz showed that a co mplex network is a small world displaying bot h the highly clustered sets of nodes typical of regular graphs and the small path lengths bet ween any two nodes typical of random graphs . They computarized the clustering coefficient and re cognized the importance of short cuts. Further experiments confirmed that documents on the web are nineteen clicks away from each other i n average. 237. bibliometrics laws in the cyberworld: complex network analysis. In 19678, Albert and Barabasi issued an alter native class of models for the large-scale prop erties of complex networks. Networks grow by the addition of new nodes linkin g to already existing ones. This addition follows a mechanism of preferential a ttachment that replicates the Matthew Effect. This means that nodes have a higher probability to link with highly connected nodes than with poorly c onnected or isolated ones. 238. bibliometrics laws in the cyberworld: complex network analysis. P(n) = 1 n a P(n) is the probability that a node has to est ablish a link. n is a node. An experiment in 1999 confirmed the World Wide Web is a scale-free netw ork governed by the power law. 239. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. Nowadays, the network came increasingly to represent not simply a communication f acility, but a tool for building online collabo ration platforms where new knowledge can be created, modified, and negotiated, in a sort of virtual laboratory without walls. Sociologists have been using Social Netw ork Analysis (SNA) in the World Wide Web hyperlink texture since 1997. It is called H yperlink Network Analysis (HNA). 240. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. Objectives: Check whether the hyperlink network is organi zed around central websites which play the rol e of hubs. Centrality measures are carried out by countin g the number of ingoing and outgoing links for a given website (indegree and outdegree cent rality). Centrality has an aspect of closeness, inten ded to single out the website with the shortest path to all others. Betweeness estimates a websites frequency 241. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. OHNA techniques have been promisingly applied in case studies dealing with topics such as e-commerce; social movements; a nd interpersonal, interorganizational, and i nternational communication. But, can links be used as proxies for scient ific communication flows and as building bl ocks of new, web-inclusive scientometric i ndicators of research prominence? 242. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. In 1995, Bossy suggested that the digital n etwork layer offered an unprecedented so urce of information on the scholarly socioc ognitive activities that predate publication ouput. It meant to move from bibliographic citatio n to webpages, websites and links from un iversities, departments, research institutes and individual scientists webpages. At first, Altavista was used. 243. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. In 1995, Algorythm of co-word mapping by Prabowo and Thellwall was used by Leyde sdorff and Curran to identify the connectivi ty patterns of the Triple-Helix. The Web Impact Factor (WIF) of a site or area of the Eb, introduced by Ingwersen in 1998 may be defined as a measure of the frequency with which the average webpag e of the site has been linked at a certain ti me. 244. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators.WIF(S) = I = 100 =2 P 50 S is the Site. I is the total number of link pages (includin g self-link) to the Site. P is the number of webpages published in S that are indexed by the search engine. 245. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. But where do link data come from? How re liable and valid are the tools for gathering t hem? Commercial search engines dont restore a reliable and consistent picture of global a nd local connectivity rates over time becau se: Search engines crawl and index only a small portion of the World Wide Web. There is an i nvisible web. Different search engines use distinct crawling algorithms. 246. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. The WIF is also not a very good bibliometri c measure, due to content variability and s tructural instability: The number of links can be spuriously inflated by a huge number of unlinkable files, and the f ormat of the webpage can be as single or split . Webpages also lack coding standarization an d their half-life is variable. For longitudinal studies, www.archive.org c an be used. 247. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. Since 2000 the Academic Web Link Datab ase Project has been collecting link data r elative to the academic web spaces of Ne w Zealand, Australia, UK, Spain, China an d Taiwan. Mike Thelwalls Alternative Document Mod els (ADMs) allow modulating link analysis by truncating the linking URLs at a higher l evel than that of the web page: Directory 248. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. The Webometrics Ranking of World Univer sities (www.webometrics.info) launched in 2004 in Spain. It ranks web domains of academic and res earch organizations according to volume, visibility and impact of their content. They apply WIF to capture ratio between v isibility, measured by inlink rates returned by commercial search engines, and size, measured by number of hosted web pages 249. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. Two additional measures, dubbed Rich file and Scholar Indexes, capture the volume of potentially relevant academic output in s tandard formats: Adobe Portable Document Format .pdf Adobe PostScript .ps Microsoft Word Document .doc Microsoft Powerpoint .ppt And the number of papers and citations for each academic domain in Google Scholar. 250. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. Thelwall and colleagues methodology of li nk analysis also investigates the patterns of connections between groups of academ ic sites at the national level. University websites have been found to be relatively more stable than other cyber-trac es in longitudinal studies. But we have to remember that web visibilit y and academic performance are different affairs. 251. Citation analysis in the cyberworld: hyperlink network analysis, webometrics, and the promise of web scientometric indicators. Bibliometricians usually resort to direct surve ys of webmasters reasons to link or hyperlink context and content analysis to investigate th e psychological side of the link generation pro cess. Links usually are meant to facilitate navigatio n toward quarters of loosely structured and g enerically useful information, or to suggest rel ated resources. But they alone are not sufficient to pin down c ommunication patterns on the Web and their