turning xml to xls on the jvm, without loosing your sanity, with groovy

23
 © Quanticate 2018 Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness Turning XML into XLS with Groovy Nick Burch

Upload: gagravarr

Post on 16-Mar-2018

24 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

 © Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Turning XML into XLS with GroovyNick Burch

Page 2: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Turning XML to XLS, on the JVM, without loosing your sanity, with Groovy!

2

Page 3: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

What is Groovy?

3

Groovy, now Apache Groovy, is a JVM based language Optionally Typed, Dynamic Language Many features inspired by Python, Ruby, SmallTalk Java Friendly – Can use Java classes & libraries, but also 

can donate classes back to be used in Java Seemless integration with Java, similar syntax A lot less boilerplate than Java! But Java is learning...

Page 4: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Fun with PDF XML

4

You can attach comments in a PDF These have text, along with colour, and optionally some 

standard Dublin Core­esque metadata Best done with Acrobat or open source tools Surprisingly popular with many business sectors, 

Quanticate’s included, mostly for good reason! Good news – Acrobat can export as XML as XFDF

Page 5: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

5

Page 6: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Groovy and XML

6

http://groovy­lang.org/processing­xml.html Very easy and light­weight way to start processing XML Can initially treat XML as a big map, and access elements 

just with dots Multiple children of the same type treated as a list Can access attributes as properties with the @ prefix For simple XML, very short, succinct code

Page 7: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

And in code....

7

XmlSlurper and XmlParser main 2 entry points

def response = new    XmlSlurper().parseText(books)

def firstAuthor = response.     value.books.book[0].author

assert firstAuthor.text() ==     'Manuel De Cervantes'

assert firstAuthor.@id == 1

Page 8: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Groovy and advanced XML

8

http://groovy­lang.org/processing­xml.html However... Still have a full XML DOM on hand Can run arbitrary DOM and XPath querys Feels a bit like JQuery etc – friendly, friendly, advanced 

selector, back to friendly again! Call selector method, then find or findAll with a closure

Page 9: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Groovy and lists

9

http://docs.groovy­lang.org/next /html/documentation/working­with­collections.html

Define simply inline, eg def stuff = [“1”,”b”,”Test”] Run something for each, eg stuff.each { l   println l }→ Filter and transform with grep and collect methods Can run any method on all entries with * syntax, eg

stuff*.trim().sort().unique()

Page 10: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Groovy and Strings

10

http://docs.groovy­lang.org/latest/html/documentation/#all­strings

Can be regular Java strings, or GStrings with extra methods and functionality, Groovy sorts this for you

Use tripple single quotes to force Java string Use tripple quotes for multi­line strings In GStrings, can interpolate with ${...} eg

“Hello ${world}” or “1+3 is ${1+3}” or “List is {l.size()} big”

Page 11: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

PoC to process the XML

11

 File xmlFile = new File(args[0])

 def xml = new XmlSlurper().parse(xmlFile)

 // Process all the Annotations

 // ­ Find everything under /xfdf/annots/freetext

 // ­ If that has no Subject, is the Domain

 // ­ If that does, grab text from /contents­richtext/body/p

 def freetext = xml.annots.freetext

 println "There are ${freetext.size()} annotations"

Page 12: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

PoC to process the XML

12

def lastDomain = "N/A"freetext.each { ft ­>   def body = ft."contents­richtext".body   def content = body.text() // Get all text of entries of P etc under the body

   if (ft['@page']) {      If (! ft['@subject'].isEmpty()) {println "P ${ft['@page']} ­ D ${lastDomain} ­ S ${ft['@subject']} ­ V ${content}"      } else {        lastDomain = content     }

Page 13: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Apache POI

13

https://poi.apache.org/ Pure Java library for reading and writing most Microsoft 

Office file formats Especially strong on SpreadSheets (XLS and XLSX) More closely alligned to the file formats than the 

applications, which can sometimes cause surprises (eg where Excel doesn’t store what you thought it did...)

Page 14: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Simple XLS Exporter

14

Workbook wb = new HSSFWorkbook()Sheet s = wb.createSheet(“Variables”)dvars.eachWithIndex { v, idx ­>            Row r = s.createRow(idx)            r.createCell(0).setCellValue(dname)            r.createCell(1).setCellValue(v.variable)            r.createCell(2).setCellValue(v.pages.join(" "))            r.createCell(3).setCellValue(v.comments.join(" : "))}(0..3).each { col   s.autoSizeColumn(col) }→(new File(“output.xls”)).withOutputStream { out ­> wb.write(out) }

Page 15: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Fancy filterable headers

15

public static Sheet headerSheet(Workbook wb, String name, List<String> headers) {      CellStyle csHeader = makeHeaderStyle(wb, headerHeight)      Sheet s = wb.createSheet(name)      Row r = s.createRow(0)      r.setHeightInPoints(headerHeight+1)      headers.eachWithIndex { col, idx ­>         Cell c = r.createCell(idx)         c.setCellValue(col)         c.setCellStyle(csHeader)      }      s.setAutoFilter(new CellRangeAddress(0, 0, 0, headers.size()­1))      return s}

Page 16: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

The Real Thing

16

Final Requirements were a bit more involved, and there were more edge cases than initially expected...

XML Processing code: ~150 lines XLS Export code: ~150 lines I’m sure a Groovy expert could get that shorter without 

affecting readability or maintainability! Uses Gradle to fetch Apache POI, do single Shadow Jar

Page 17: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

17

Page 18: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

18

Page 19: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

19

Page 20: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

20

Page 21: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Learning Groovy

21

Groovy In Action book http://groovy­lang.org/learn.html  Getting Started and Module Guides

http://groovy­lang.org/documentation.html  StackOverflow questions Groovy Docs, esp. for Java enhanced eg

http://docs.groovy­lang.org/docs/groovy­2.4.13/html/groovy­jdk/java/lang/String.html

Page 22: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

22

Page 23: Turning XML to XLS on the JVM, without loosing your Sanity, with Groovy

© Quanticate 2018Our Services: Biostatistics • Clinical Programming • Clinical Data Management • Medical Writing • Pharmacovigilance

Our Values: Relationships • Excellence • Accountability • Customer Focus • Happiness

Any Questions?

23