data search searching and finding information in unstructured and structured data sources

35
1

Upload: erik-fransen

Post on 22-Jan-2015

2.013 views

Category:

Documents


1 download

DESCRIPTION

IRM DWBI event in the UK 2009

TRANSCRIPT

  • 1. 1

2. Data SearchSearching and Finding information in Unstructured and Structured Data SourcesErik Fransen11.00-12.00 P.M. November, 3Senior Business ConsultantIRM UK, DW/BI 2009, London Centennium BI expertisehuisThe Hague, The [email protected] 2 3. Agenda Introduction; Industry models; Combining structured & unstructured data Pure Portal Index it all Structure it all Summary. 3 4. Profile Erik Fransen Background: Knowledge Engineering, Middlesex University; Expertise areas: Business Intelligence Knowledge engineering Knowledge & Content management Data warehousing Analytics CBIP. 4 5. Introduction 5 6. Combining BI with unstructured data Integrated access to relevant information (provide complete picture); Unstructured data like documents provide valuable context to numerical data; Customer complaints Competitors press releases Marketing documents Insurance fraud analysis (i.e. claim statistics and claim forms); Competitive Intelligence (i.e. market share data and competitor news); Customer retention (i.e. sales data and customer complaints); Data Search acts as a bridge between structured and unstructured data. 6 7. (un)structured data keeps growing.2009 2005Cave paintings, Bone tools 40,000 BCWriting 3500 BC 2001>80% UnstructuredPaper 105Printing 1450 2000Electricity, Telephone Oracle-79 1870 SQL-70Transistor 1947 SQL-89GIGABYTESComputing 1950SQL-92 Internet (DARPA) Late 1960sSQL-99The Web 1993SQL-031999Source: Forrester 7 8. Industry Model:Text DataBill Inmons DW 2.0 Hold data at the lowest detail; Hold data to infinity; Have integrity of data and have online high-performance transaction processing; Tightly couple metadata to the data warehouse environment; Link structured data and unstructured data; 8 9. Industry Model: Information Access Architecture (Gartner) 9 10. Industry Model:Enterprise Search Platform (Forrester) 10 11. Data Search Scenarios Searching and Finding information inUnstructured and Structured Data Sources 11 12. Global architecture Master & Meta DataReportsStructuredDataData DWHOLAPMarts Marts OLTPMining CubesFinancialODSApps Middleware Portal ContentMan SystemUnstructuredSearchSearch FileserversIndexText Mining DatabaseVisualisation EmailIntranet/inte rnet12 13. Three data search scenarios Master & Meta Data StructureReportsStructured Data Data it all DWH MartsMartsOLAP OLTPMining CubesFinancial ODSApps Middleware IndexPureit all Portal Portal ContentMan SystemUnstructured Search Search Fileservers Index Text MiningDatabase Visualisation EmailIntranet/inte rnet13 14. Scenario 1: Pure Portal Many portlets, one user interface;Business user may manually combines contentfrom several independent sources;Risk: too complex for user. 14 15. 1: Pure Portal Master & Meta Data ReportsStructured DataDataDWHOLAP Marts Marts OLTP MiningCubesFinancial ODSApps Middleware PurePortalPortal ContentMan SystemUnstructured SearchSearch Fileservers IndexText MiningDatabaseVisualisation EmailIntranet/inte rnet15 16. Integrate news with BI informationSource: Aruba 16 17. Structured BI info 17 18. and Photos, Files and Maps 18 19. Scenario 2: index it all Enterprise Search from one user interface;Business user knows what to look for and expectsa complete picture as a result;Risk: Many irrelevant search results due to thenature of document indexing. 19 20. 2: Index it all Master & Meta DataReportsStructured Data DataDWH OLAP MartsMarts OLTPMining CubesFinancial ODSApps Middleware Indexit all Portal ContentMan SystemUnstructured Search Search Fileservers Index Text MiningDatabase Visualisation EmailIntranet/inte rnet20 21. Scenario 2: Index it all Unstructured SearchSearch index data sources applicationUser interfaceBI report is indexedas if it was a document Data warehouseBIStructuredapplication ArchitectureReports data sources 21 22. Example: IBM Cognos 8 Go! Search Integration with enterprise search applications (IBM OmniFind, Google OneBox for Enterprise, Yahoo, Autonomy)Search results return all relevant structured content (reports, analyses, etc.) and unstructured content (Word documents, PDFs, et) within a single interface. 22 23. Example: IBM OmniFind 23 24. Example: IBM OmniFind 24 25. SAP BusinessObject Intelligent Search 25 26. SAP BusinessObject Intelligent Search 26 11/9/2 27. Scenario 3: Structure it all Generate structure using document warehousingand text mining;Business user knows exactly what to look for;Risk: Limited flexibility for user. 27 28. 3: Structure it all Master & Meta Data Structure ReportsStructured DataData it all DWH Marts Marts OLAP OLTP MiningCubesFinancial ODSApps MiddlewarePortal ContentMan SystemUnstructured SearchSearch Fileservers IndexText MiningDatabaseVisualisation EmailIntranet/inte rnet28 29. Generating structure in document warehouse RetrievePreprocessCompile Identify SourcesText MiningDocumentsDocuments MetadataSources are notInternal sources FormatLinguistic analysis Carefully attach fixedretrieval, filedocuments in aKey features aremetadata to Iterative process, servers, consistent matter extracted document sources lead toCMS/DMSFiles must be inIndexingUsed for new sourcesExternal sourcesuitable form for documents querying,retrieval, using text analysis Summarizing matching,crawlers, spidersdocuments navigationSources are notsupportfixedStore inIterative process, documentsources lead towarehousenew sourcesSource: Dan SullivanData warehouseDocument warehouseArchitectureArchitectureCombine (meta)data 29 30. Document warehouse Contains complete documents or URLs Metadata about documents:summaries, authors names, publicationdates, titles, sources, keywords, etc. Translations of documents Thematic clustering of similarDocument warehousedocuments Architecture Topical or thematic indexes Extracted key features (structure) Dimensions and Facts, linked to documents, summaries etc. Combine with the data warehouse30 31. BI reporting on dimensional modelDim Dim Dim ActionProduct CustomerDimSalesCallCompetitor Facts Facts Dim Dim DimSales person TimeTelco TermData warehouseDocument warehouse 31 32. Generate structure using text mining toolsExample taken from SPSS PASW Text Analytics, many other tools available:IBM, SAS, Oracle, SAP BO, Microsoft etc. etc. 32 33. Generating structure using UIMA Unstructured Information Management Architecture Originates from IBM, now Apache UIMAhttp://incubator.apache.org/uima/ Source: IBM UIMA is supported by all main BI vendors. 33 34. Example: Generating structure using UIMA Analyzed by a collection of text analytics Detected Semantic Entities and Relations Highlighted Represented in UIMA Common Analysis Structure (CAS) 34 35. Summary Growing business need for combining BI withunstructured data; Data Search bridges the gap between bothworlds Scenario 1: Pure Portal Scenario 2: Index it all Scenario 3: Structure it all Scenarios can be combined. Questions? 35