discussion 2: task 12agenda •task 12 (due 17th) –tokenize –word frequencies –2-grams...

16
Discussion 2: task 12 Jan 15 th , 2014 1 INF 141 / CS 121 Tao Wang

Upload: others

Post on 27-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Discussion 2: task 12

    Jan 15th, 2014

    1 INF 141 / CS 121 Tao Wang

  • – 王涛 • “Tao” means wave in Chinese

    – Originally from China

    – Undergrad in Computer Science • The University of Auckland, New Zealand

    – Worked as software engineer for six years in New Zealand

    – Moved to Irvine for PhD in 2011

    – Research interest • Ubiquitous computing

    • Assistive technology

    • Accessibility

    • Healthcare informatics

    INF 141 / CS 121 Tao Wang 2

  • Agenda

    • Task 12 (due 17th)

    – Tokenize

    – Word frequencies

    – 2-grams

    – Palindromes

    – How will it be graded

    – Your questions

    INF 141 / CS 121 Tao Wang 3

    • General

    – Regular expression

    – Junit

  • Tokenize – Multiple ways

    • Scanner: more flexible; works with regular expression; token can be converted to different types

    • StringTokenizer: legacy class; its use is discouraged. Doesn’t work with regular expression • String.split: works with regular expression

    – Scanner • public Scanner(File source) throws FileNotFoundException public Scanner(File source, String charsetName) throws FileNotFoundException

    – provide charsetName if not the underlying platform’s default charset (such as “US-ASCII”, “UTF-8”, “UTF-16”).

    • public Scanner useDelimiter(String pattern) Sets this scanner's delimiting pattern to a pattern constructed from the specified String. Its delimiter pattern matches whitespace by default • More info here

    http://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html#Scanner(java.io.File, java.lang.String)

    INF 141 / CS 121 Tao Wang 4

    http://docs.oracle.com/javase/7/docs/api/java/io/File.htmlhttp://docs.oracle.com/javase/7/docs/api/java/io/FileNotFoundException.htmlhttp://docs.oracle.com/javase/7/docs/api/java/io/File.htmlhttp://docs.oracle.com/javase/7/docs/api/java/lang/String.htmlhttp://docs.oracle.com/javase/7/docs/api/java/io/FileNotFoundException.htmlhttp://docs.oracle.com/javase/7/docs/api/java/util/Scanner.htmlhttp://docs.oracle.com/javase/7/docs/api/java/lang/String.htmlhttp://docs.oracle.com/javase/7/docs/api/java/util/Scanner.htmlhttp://docs.oracle.com/javase/7/docs/api/java/util/Scanner.htmlhttp://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html

  • Word frequencies

    – Counting frequency

    • Many ways to implement, for example, using a key/value pair based data structure

    – Sort

    • Create a subclass of Frequency and implement Comparable interface

    • Use Comparator and Collections.sort

    public static void sort(List list, Comparator

  • 2-grams

    – Basically the same as Word Frequencies

    – Additional step to compile all 2-grams

    INF 141 / CS 121 Tao Wang 6

  • Palindromes

    INF 141 / CS 121 Tao Wang 7

  • Palindromes

    INF 141 / CS 121 Tao Wang 8

  • Palindromes

    INF 141 / CS 121 Tao Wang 9

  • Palindromes

    INF 141 / CS 121 Tao Wang 10

  • Palindromes

    INF 141 / CS 121 Tao Wang 11

  • Palindromes

    – Non-white space, non punctuation

    – We only want palindromes >= 5

    – Longest palindromes possible

    • If a smaller palindrome is completely contained in a longer palindrome, we only count the longer palindrome, not the shorter one

    • If two palindromes overlap, but one is not contained by the other, we count both

    – Sorting the returned list by decreasing frequency, then alphabetically

    INF 141 / CS 121 Tao Wang 12

  • How will it be graded - Submit to checkmate - One single zip file matching skeleton code structure - Use README.txt to communicate any

    concerns/information - Late submission penality - Evaluation

    - Runnable - Correctness - Efficiency - Aesthetics - Junit will be used with additional (more complicated) test

    cases, extreme cases, etc

    INF 141 / CS 121 Tao Wang 13

  • Regular expression

    – Slides from last week http://www.ics.uci.edu/~djp3/classes/2014_01_INF141/Lectures/Discussion_01.pdf

    – Content adapted from http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#regexjava

    – RegexPlanet http://www.regexplanet.com/advanced/java/index.html

    INF 141 / CS 121 Tao Wang 14

    http://www.ics.uci.edu/~djp3/classes/2014_01_INF141/Lectures/Discussion_01.pdfhttp://www.ics.uci.edu/~djp3/classes/2014_01_INF141/Lectures/Discussion_01.pdfhttp://www.vogella.com/tutorials/JavaRegularExpressions/article.htmlhttp://www.vogella.com/tutorials/JavaRegularExpressions/article.htmlhttp://www.regexplanet.com/advanced/java/index.htmlhttp://www.regexplanet.com/advanced/java/index.html

  • JUnit

    – Tutorial http://www.vogella.com/tutorials/JUnit/article.html

    INF 141 / CS 121 Tao Wang 15

    http://www.vogella.com/tutorials/JUnit/article.htmlhttp://www.vogella.com/tutorials/JUnit/article.html

  • Your questions

    INF 141 / CS 121 Tao Wang 16