knime & bioinformatics
Post on 20-Feb-2017
956 Views
Preview:
TRANSCRIPT
Copyright © 2015 KNIME.com AG
Биоинформатик в тридевятом царстве, или двое программистов из ларца KNIME
Oleg Yasnev
KNIME.com
Copyright © 2015 KNIME.com AG 2
А вы что же за меня и код
писать будете?
Ага!
Кадр из мультфильма «Вовка в тридевятом царстве» © «Союзмультфильм»
Copyright © 2015 KNIME.com AG 3
KNIME.com
3
Copyright © 2015 KNIME.com AG 4
KNIME.com
• KNIME.com founded in 2008
• Offices in Zurich, San Francisco (Aug ‘13), Berlin (May ‘14) and Konstanz (October ‘15)
• 15 open source releases, 10 product releases (in 2014)
• >2m lines of code
• 600k lines of community code
4
Copyright © 2015 KNIME.com AG 5
Advanced Analytics
Pharma
Health Care
Finance
Retail
Customer Intelligence
Manu-facturing
Broad Range of KNIME Application Areas
5
Copyright © 2015 KNIME.com AG 6
The KNIME Analytics Platform
6
Copyright © 2015 KNIME.com AG 7
From Access to Visualization and Deployment
Copyright © 2015 KNIME.com AG 8
Data Access
• Databases– MySQL, PostgreSQL– any JDBC (Oracle, DB2, MS SQL
Server)
• Files– Csv, txt– Excel, Word, PDF– SAS, SPSS– XML– PMML– Images, texts, networks, chem
• Web, Cloud– REST, Web services– Twitter, Google
Copyright © 2015 KNIME.com AG 9
Big Data
• HDFS support
• Hive
• Impala
• HP Vertica
• In-database processing
Copyright © 2015 KNIME.com AG 10
Transformation
• Preprocessing
– Row, column, matrix based
• Data blending
– Join, concatenate, append
• Aggregation
– Grouping, pivoting, binning
• Feature Creation and Selection
Copyright © 2015 KNIME.com AG 11
Analyze & Data Mining
• Regression– Linear, Logistic
• Classification– Decision tree, ensembles,
SVM, MLP, Naïve Bayes
• Clustering– k-means, DBSCAN, hierarchical
• Validation– Cross-validation, scoring, ROC
• Misc– PCA, MDS, item set mining
• External– R, Weka
Copyright © 2015 KNIME.com AG 12
Visualization
• Interactive
– Scatter plot, histogram, pie charts, box plot
– Highlighting (brushing)
• JFreeChart
• JavaScript
• Misc
– Tag cloud, open street map, networks, molecules
• External
– R
Copyright © 2015 KNIME.com AG 13
Deployment
• Database
• Files
– Excel, csv, txt
– XML
– PMML
– to: local, KNIME Server, SSH-, FTP-Server
• BIRT Reporting
Copyright © 2015 KNIME.com AG 14
StatisticsData MiningMachine LearningWeb AnalyticsText MiningNetwork AnalysisSocial Media AnalysisWEKARCommunity / 3rd
MySQL, Oracle, etc.SAS, SPSS, etc.Excel, Flat, etc.Hive etc.XML, PMMLText, Doc, ImageWeb CrawlersIndustry SpecificCommunity / 3rd
ETLRow, ColumnMatrixText, ImageTime SeriesJavaPythonCommunity / 3rd
RJFreeChartCommunity / 3rd
via BIRTPMMLXMLDatabasesExcel, Flat, etc.Hive etc.Text, Doc, ImageIndustry SpecificCommunity / 3rd
Over 1000 native and embedded nodes included:
14
Copyright © 2015 KNIME.com AG 15
KNIME: Integrating Data and Tools
15
Copyright © 2015 KNIME.com AG 16
Big Data.Pre-processing on Hadoop
Copyright © 2015 KNIME.com AG 17
In-Database Processing
17
Loads your pre-processeddata into KNIME
Copyright © 2015 KNIME.com AG 18
Reader/Writer
• Table selection
• Load data into KNIME
• Create table as select
• Insert/append data
• Delete rows from table
• Update values in table
18
Copyright © 2015 KNIME.com AG 19
Hive/Impala Loader
• Upload a KNIME data table to Hive/Impala
• Part of the commercial Big Data Extension
19
Copyright © 2015 KNIME.com AG 20
Manipulation
• Filter rows and columns
• Join tables/queries
• Sort your data
• Write your own query
• Aggregate your data
20
Copyright © 2015 KNIME.com AG 21
Database GroupBy – Manual Aggregation
21
Copyright © 2015 KNIME.com AG 22
Database GroupBy – Type Based Aggregation
22
Matches all cells
Matches all numericcells
Copyright © 2015 KNIME.com AG 23
Utility
• Drop table
– missing table handling
– cascade option
• Execute any SQL statement e.g. DDL
• Manipulate existing queries
23
Copyright © 2015 KNIME.com AG 24
HDFS File Handling
• New nodes
– HDFS Connection
– HDFS File Permission
• Utilize the existing remote file handling nodes
– Upload/download files
– Create/list directories
– Delete files
24
Copyright © 2015 KNIME.com AG 25
HDFS File Handling
25
Copyright © 2015 KNIME.com AG 26
Workflow 1: PrepareData
26
~ 2 daysIrish Smart Energy Meter Trials• July 2009 – Dec 2010• 6000 meters• roughly 176m rows of data
Copyright © 2015 KNIME.com AG 27
Import Data from Database into KNIME
27
< 30 min
Copyright © 2015 KNIME.com AG 28
Big Data.Machine Learning on Hadoop
Copyright © 2015 KNIME.com AG 29
Machine Learning on Hadoop
• Based on Spark MLlib
• Scalable machine learning library
• Runs on Hadoop
• Algorithms for
– Classification (decision tree, naïve bayes, …)
– Regression (logistic regression, linear regression, …)
– Clustering (k-means)
– Collaborative filtering (ALS)
– Dimensionality reduction (SVD, PCA)
29
Copyright © 2015 KNIME.com AG 30
MLlib Integration
• Usage model and dialogs similar to existing nodes
• No coding required
Copyright © 2015 KNIME.com AG 31
MLlib Integration
• MLlib model ports for model transfer
• Native MLlib model learning and prediction
• Spark nodes start and manage Spark jobs
• Supports Spark job cancelation
Native MLlib model
Copyright © 2015 KNIME.com AG 32
MLlib Integration
• Spark RDDs as input/output format
• Data stays within your cluster
• No unnecessary data movements
• Several input/output nodes e.g. Hive, hdfs files, …
Copyright © 2015 KNIME.com AG 33
Mass Learning – Fast Event Prediction
• Convert supported MLlib models to PMML
• Mass learning on Hadoop
• Fast event prediction based on compiled models
Copyright © 2015 KNIME.com AG 34
Mix and Match
• Combine with existing KNIME nodes
Copyright © 2015 KNIME.com AG 36
Modularize and Execute Your Own Spark Code
Copyright © 2015 KNIME.com AG 37
Spark Node Overview
Copyright © 2015 KNIME.com AG 38
А что же Rocket Science?
38
Copyright © 2015 KNIME.com AG 39
Community Contributors
39
TechnologyPartners
Distribution& ConsultingPartners
CommunityContributors
CommunityUser Base
Donated byCompanies
Contributions fromResearch
Institutions
Maintained byKNIME
Copyright © 2015 KNIME.com AG 40
Community Contributors
40
TechnologyPartners
Distribution& ConsultingPartners
CommunityContributors
CommunityUser Base
Academic Institutions:
• Universität Tübingen (BALL, OpenMS)
• Freie Universität Berlin (SeqAn)
• MPI Dresden (ImgLib)
• Universität Dresden (Palladin)
• ETH Zürich (OpenBIS)
• Dublin University (OMERO)
• University of Wisconsin (ImageJ2)
• …
Commercial Contributors:
• Dymatrix Consulting Group (Uplift Nodes)
• Eli Lilly (ChemInf suite)
• Novartis (RDKit, Indigo)
• Vernalis (Proteomics)
• Cenix (REST Nodes)
• Böhringer-Ingelheim (various sponsored nodes)
• …
Copyright © 2015 KNIME.com AG 41
Bioinformaticshttps://tech.knime.org/bioinformatics-and-next-generation-sequencing-extensions
Copyright © 2015 KNIME.com AG 42
OpenMS
Open-source software C++ library for liquid chromatography–mass spectrometry data management and analyses.
https://tech.knime.org/community/bioinf/openms
Copyright © 2015 KNIME.com AG 43
SeqAn
Open-source C++ library of efficient algorithms and data structures for the analysis of sequences with the focus on biological data.
https://tech.knime.org/seqan-nodes-for-knime
Copyright © 2015 KNIME.com AG 44
NGS
Nodes and workflows used for processing next generation sequencing results
https://tech.knime.org/community/next-generationsequencing
Copyright © 2015 KNIME.com AG 45
knime4bio
Set of custom nodes for analysing NGS data
https://code.google.com/p/knime4bio/
Copyright © 2015 KNIME.com AG 46
Image Processinghttps://tech.knime.org/community/image-processing
Copyright © 2015 KNIME.com AG 47
Active Classification in Cell Assay Images
• Different modules for segmentation and feature extraction
• Active Learning
Copyright © 2015 KNIME.com AG 48
Active Classification in Cell Assay Images
CellMiner Nodes
Plate/Image Reading
– Plate Reader, Plate Editor, Plate View
Preprocessing
– Noise Filtering, Lowpass Filter
Segmentation
– Threshold based Segmentation, Voronoi Segmentation
Features
– Line, Histogram, Texture, RGB, Zernike Moments, Shape
Active Classification
Copyright © 2015 KNIME.com AG 49
Chemistry and Cheminformaticshttps://tech.knime.org/cheminformatics-extensions
Copyright © 2015 KNIME.com AG 50
Selected Open Source extensions
50
Copyright © 2015 KNIME.com AG 51
Selected commercial extensions
51
Copyright © 2015 KNIME.com AG 52
Overview of types in KNIME
52
• Basic KNIME types
• string, integer, double
• KNIME core chemistry types:
• smiles, sdf, mol, mol2
• Structures in these formats can be rendered in KNIME tables
Copyright © 2015 KNIME.com AG 53
Nodes for type manipulation
53
• Molecule Type Cast• Casts any string as a chemical type (i.e. It
tells KNIME “This is a smiles string”)
• Useful when reading data form a csv file or database.
• Marvin MolConverter• Provided by Chemaxon/Infocom
• Translates seamlessly between types (smiles sdf mrv)
Copyright © 2015 KNIME.com AG 54
Nodes for reading and writing files
54
Reader and writers provided for:
- sdf, smiles, mol, mol2
Copyright © 2015 KNIME.com AG 55
Sketching chemical structures – use Marvin
55
MarvinSketch• Provided by Chemaxon/Infocom
• Sketch structures in the configuration dialog
• Execute node to inject structures into workflow
Copyright © 2015 KNIME.com AG 56
RDKit
56
• Open source cheminfo library in c++
• Wrappers for KNIME maintained by the open source community
• Useful for:
Descriptor calculation
Cleaning structures
InChi conversion
Standardizing smiles
Fingerprints
Scaffolds/substructures
Reaction simulation
and more…
Copyright © 2015 KNIME.com AG 57
Infocom JChem KNIME Nodes
Extensions of ChemAxon’s tool for KNIME workflow
Infocom implements it with the support of ChemAxon
Contains over 90% of ChemAxon'scheminformatics functionality
Copyright © 2015 KNIME.com AG 58
ChEMBL
58
A public database of bioactive druglike compounds~1.3 mio compounds~ 9k targets~12 mio bioactivitities
Provided by the European Bioinformatics InstituteAccessible online at www.ebi.ac.uk/chemblor via EBI provided KNIME nodes…
Copyright © 2015 KNIME.com AG 59
New Node: ChEMBLdb Connector
59
Access data in ChEMBL via a web service call(internet access required)
Lookup by ChEMBLID or InChi KeyRetrieve structure and bioactivity data
Compound search using smilesexact, similarity, or substructure
Copyright © 2015 KNIME.com AG 60
Tool Integrations
Copyright © 2015 KNIME.com AG 61
• Select the KNIME version for your computer
– (Mac, Win, Linux)
• Copy to your local machine
• Unpack the file in a “nice” place
Install KNIME
61
Copyright © 2015 KNIME.com AG 62
Start KNIME
Go to the installation directory and launch KNIME.
62
Copyright © 2015 KNIME.com AG 63
The Workspace
• The workspace is the folder in which workflows (and potentially data files) for the current KNIME session is stored.
• Workspaces are portable (just like KNIME)
63
Copyright © 2015 KNIME.com AG 64
Starting KNIME for the first time
64
Install additional extensions
Goes straight to theKNIME workbench
Copyright © 2015 KNIME.com AG 65
The KNIME Workbench
65
Copyright © 2015 KNIME.com AG 66
A basic workflow
66
Copyright © 2015 KNIME.com AG 67
More on nodes…
A node can have 3 states:
67
Idle: The node is not yet configured and can not be executed with it’s current settings.
Configured:The node has been set up correctly, and may be executed at any time
Executed: The node has been successfully executed. Results may be viewed and used in downstream nodes.
Copyright © 2015 KNIME.com AG 68
Node configuration
• Most nodes require configuration
• To access a node configuration window:
• Double-click the node
• Right-click > Configure
68
Copyright © 2015 KNIME.com AG 69
Node execution
• Right-click node
• Select Execute in context menu
• If execution is successful, status shows green light
• If execution encounters errors, status shows red light
69
Copyright © 2015 KNIME.com AG 70
Node views
• Right-click node
• Select Views in context menu
• Select output port to inspect executionresults
70
Copyright © 2015 KNIME.com AG 71
Hotkeys (for future reference)
71
Copyright © 2015 KNIME.com AG 72
A Peak under the Hood:KNIME (Node) Development
72
Copyright © 2015 KNIME.com AG 73
Node Architecture
• KNIME interacts only with a Node
• Node takes care of embedding the node in the infrastructure
• New nodes implement Model/View/Dialog
73
class Node
(final)
class
Node-
Dialog-
Pane
(abstract)
class
Node-
View
(abstract)
class
Node-
Model
(abstract)
class NodeFactory (abstract)
Copyright © 2015 KNIME.com AG 74
Node Extension Wizard
• Included in the KNIME Developer Version
• Allows creation of plugin projects including functioning KNIME nodes (with sample code)
• Helpful to easily create all node classes
– Generates all Java classes
– Node is registered with the plugin project
– Launch KNIME and enjoy the new node working!
74
Copyright © 2015 KNIME.com AG 75
Node Extension Wizard
75
Copyright © 2015 KNIME.com AG 76
Node Extension Wizard
• Specify all settings to create a new KNIME node– In a completely new plugin
project, or
– Into an existing project
• Node type: Sink, Source, Learner, Predictor, Manipulator, Visualizer, Meta, or Other
• Include sample code or not
76
Copyright © 2015 KNIME.com AG 77
Node Extension Wizard
• Contains all Java classes (including sample code)
• Node is registered in the plugin.xml
• NodeDialog and NodeView class are also created and registered to the NodeFactory
77
Copyright © 2015 KNIME.com AG 78
Node Development
78
Copyright © 2015 KNIME.com AG 79
Resources
• KNIME pages (www.knime.org)• APPLICATIONS for example workflows
• LEARNING HUB under RESOURCES www.knime.org/learning-hub
• KNIME Tech pages (tech.knime.org)• FORUM for questions and answers
• DOCUMENTATION for documentation, FAQ, changelogs, ...
• LABS where to find new experimental nodes
• COMMUNITY CONTRIBUTIONS for development instructions and third party nodes
• KNIME TV channel on
• KNIME on @KNIME
79
top related