jonathan simon elizabeth langdon com 633, fall 2010

32
Yoshikoder & General Inquirer Jonathan Simon Elizabeth Langdon COM 633, Fall 2010

Upload: jonah-horton

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Jonathan Simon Elizabeth Langdon COM 633, Fall 2010
  • Slide 2
  • The function of GI is to generate a count of words falling into various dictionary-supplied categories Uses categories from the Harvard IV-4 dictionary and the Lasswell dictionary, as well as five categories based on the social cognition work of Semin and Fiedler 182 categories in all Each category is a list of words and word senses
  • Slide 3
  • Examples of Harvard IV-4 categories: Pstv 1045 positive words, plus a subset of 557 words tagged Affil for words indicating affiliation or supportiveness PstvAffil Ngtv 1160 negative words, plus a subset of 833 words tagged Hostile for words indicating an attitude or concern with hostility or aggressiveness NgtvHostile Strong 1902 words implying strength, plus a subset of 689 words tagged Power, indicating a concern with power, control or authority StrongPower Weak 755 words implying weakness, plus a subset of 284 words tagged Submit, indicating submission to authority or power, dependence on others, vulnerability to others, or withdrawal WeakSubmit,
  • Slide 4
  • Examples of Lasswell categories: PowGain = 65 words about power increasing PowGain PowLoss = 109 words of power decreasing PowLoss PowEnds = 30 words about the goals of the power process PowEnds PowAren = 53 words referring to political places and environments PowAren PowCon= 228 words for ways of conflicting PowCon
  • Slide 5
  • For names and basic descriptions of each category: http://www.wjh.harvard.edu/~inquirer/homecat.htm http://www.wjh.harvard.edu/~inquirer/homecat.htm For a list of all words contained in each of the 182 categories: http://www.webuse.umd.edu:9090/tags/http://www.webuse.umd.edu:9090/tags/
  • Slide 6
  • Users CAN add new categories Considerations for adding categories: Somewhat comparable to producing a set of survey questions that everyone agrees has validity in measuring a well-specified construct To map categories with accuracy requires attention to word use, word senses, and disambiguation routines
  • Slide 7
  • Purpose: Analyze content of news articles from three different sources Articles are about the same Ted Strickland fundraiser Include a newscast (via closed captioning) from WKYC, an online article from FOX8, and online article from The Plain Dealer
  • Slide 8
  • Beginning Screens:
  • Slide 9
  • Input: Select the content you wish to analyze Use plain text format (.txt) Analyze a single file or multiple files at one time To analyze multiple files simultaneously, save them to a directory (e.g. F:\NewsArticles) In output, each file will have its own line of data within your Excel file (one row for single files, multiple rows for multiple files)
  • Slide 10
  • Output: Specify where you want the data output to be saved, name the file and add the.xls extension Dictionary: You will not need to change this! GI will analyze your content using all of its 182 categories
  • Slide 11
  • Tags: Output is a matrix of counts and percentages of words falling into the dictionaries semantic categories Format column includes r (raw count, or simple count of words) and s (scaled count, or percentage of words in each category Wordcount column is total number of words in the file Leftovers column shows words not found in any dictionary
  • Slide 12
  • Slide 13
  • Words: Output is a count of all words appearing in your file Rows are words, columns are file names
  • Slide 14
  • Slide 15
  • Overall, the WKYC article can be viewed as being more positive and affiliative when compared to the FOX and PD articles WKYC story showed highest percentages of all positively valenced categories FOX or Plain Dealer showed higher percentages of all negatively valenced categories CATA / GI findings are reflective of the overall tone of the articles, as experienced by readers (e.g. pulled quotes, emphasis on political / economic climates, etc.)
  • Slide 16
  • Slide 17
  • Yoshikoder is provides a general word count, custom dictionary word count, KWIC, and reading highlight function The program can handle multiple documents and analyze them individually or side by side All dictionaries must be either custom built or downloaded from an external source several dictionaries are available on the Yoshikoder website
  • Slide 18
  • Dictionaries consist of 2 levels: Categories and Patterns Categories are concept words that fall into a larger construct Patterns are individual words or phrases that fall into a category and are actually searched for Yoshikoder dictionaries allow wild cards (*)
  • Slide 19
  • Purpose: Analyze content of news articles from three different sources Articles are about the same Ted Strickland fundraiser Include a newscast (via closed captioning) from WKYC, an online article from FOX8, and online article from The Plain Dealer This analysis will identify which issues were most frequently mentioned in these stories given a list of predetermined possible issues
  • Slide 20
  • Beginning Screen:
  • Slide 21
  • Add Document: Documents must be.TXT file
  • Slide 22
  • Multiple Documents can be uploaded
  • Slide 23
  • 123 4
  • Slide 24
  • 567 8 9
  • Slide 25
  • It is important to make sure that the proper level is highlighted when adding a category or pattern. Yoshikoder can stack categories within each other
  • Slide 26
  • Pre-made or downloaded dictionaries can be imported
  • Slide 27
  • A Yoshikoder concordance is a KWIC analysis Concordance > Make Concordance Results can be exported to HTML or Excel
  • Slide 28
  • Report Document Word Frequencies reports the frequencies of all words in an individual document All Word Frequencies reports the frequencies of all words in all documents, sorted by document Unified Word Frequencies reports the frequencies of all words in all selected documents
  • Slide 29
  • Report Dictionary Report shows the frequencies of dictionary words, by category or pattern for an individual document A unified dictionary report downloads the category frequencies into an excel spreadsheet Document Comparison will compare any two documents Statistical Comparison Report will compare any two documents in terms of percent difference
  • Slide 30
  • Slide 31
  • Slide 32
  • The Channel 3 newscast contained more issue keywords than the Fox 8 and PD stories, with the biggest difference in focus being in education issues. The Jobs issue was most frequently mentioned, however it was more emphasized in the FOX 8 and PD story than in channel 3s coverage. The remainder of issue mentions were sporadic with little overlap between the sources.