surechem and chembl acs cinf webinar john p ......surechembl ligand structures from patent...
TRANSCRIPT
![Page 1: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/1.jpg)
SureChem and ChEMBL
ACS CINF webinar
John P. Overington & Nicko Goncharoff
8th April 2014
![Page 2: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/2.jpg)
Bioactivity data
Compound
Ass
ay/T
arge
t
>Thrombin
MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE
RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT
NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT
TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT
THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY
CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF
EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR
WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR
ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA
NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG
PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
ChEMBL – Data for Drug Discovery3. Insight, tools and resources for translational drug discovery
2. Organization, integration, curation and standardization of pharmacology data
1. Scientific facts
Ki = 4.5nM
APTT = 11 min.
![Page 3: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/3.jpg)
Overview of EMBL-EBI Chemistry Resources
UniChem – InChI-based resolver (full + relaxed ‘lenses’)
ChEMBL
Bioactivity data from literature
and depositions
ChEBI
Structures, metadata
for metabolites.
Chemical Ontology
Atlas
Ligand induced
transcript response
PDBe
Ligand structures
from structurally
defined protein
complexes
SureChEMBL
Ligand structures
from patent literature
~70M
![Page 4: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/4.jpg)
ChEMBL• The world’s largest
primary public database of medicinal chemistry data– ~1.4 million compounds,
~9,000 targets, ~12 million bioactivities
• Truly Open Data - CC-BY-SA license
• Many download/access formats– Semantic Web
• RDF download, SPARQL endpoint at http://rdf.ebi.ac.uk/chembl
– ChEMBL Applicances• myChEMBL – linux VM• ChEMpi – raspberry pi
![Page 5: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/5.jpg)
• EMBL-EBI acquired the SureChem product from Digital Science– State-of-the-art chemistry
patent product– 15 million chemical structures– Automatically extracted
chemical structures from full-text patent
• Research community wants open access to patent data – Patent literature 2-3 years
ahead of published literature – Better competitive position
• Plan to provide ongoing free, Open resource to entire community
SureChEMBL
![Page 6: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/6.jpg)
SureChEMBL Overview
WO
EPApplications& Granted
USApplications & granted
JPAbstracts
Patent Offices
Processed patents
Name to Structure (five methods)
Image to Structure(one method)
Database
Chemistry Database
Patent PDFs
Application Server
Entity Recognition
Users
API
SureChem System – Amazon Web Services
Molfiles in patent
![Page 7: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/7.jpg)
Immediate Priorities
• Migrate working pipeline across to EMBL-EBI servers
• Establish new account system
• Migrate current user accounts
• Offer GUI access at SureChem Pro equivalent level
• Turn off API access and refactor new API in OpenPHACTS framework
– Partners in OpenPHACTS will get early test access and input into development pipeline
– Build RDF version of SureChEMBL
![Page 8: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/8.jpg)
Future Plans
• Dependent on funding and interest!– Add sequence searching
– Add disease term, animal disease model, etc. indexing
– KNIME/Pipeline Pilot nodes
– Add links to/from Europe PMC
– Extend image extraction retrospectively from 2006• spot pricing compute from AWS
– Provide weekly/monthly feed of patent structures to PubChem and ChemSpider
– Add chemical structure tagging & search to full text content of Europe PMC
– Develop UniChem VM for in-house private patent alerting using feed of SureChEMBL data
![Page 9: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/9.jpg)
The search interfaceKeyword search Filter by authority
Structure sketch
Filter by document sectionhelp
Paste SMILES, MOL, name
Types of chemistry
search
Filter by
date
http://www.surechembl.org/
help
Patent number search
![Page 10: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/10.jpg)
Keyword-based search
Example Searchesroche OR novartisC07D048704sterili?ekinase*Pfizer C07D “kinase inhibitor”pn: WO2011058149A1pa:(bayer OR astra OR Genentech OR merck) AND desc:(chemotherap* AND(Phosphoinositide kinases~3 OR Pi3K))
http://support.surechem.com/knowledgebase/articles/92016-lucene-query-field-names-and-examples
![Page 11: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/11.jpg)
Fielded keyword search
Keyword search Filter by document section
Logical operators
![Page 12: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/12.jpg)
Patent number search
![Page 13: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/13.jpg)
Patent number search
![Page 14: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/14.jpg)
Chemistry-based search
Structure sketch
Paste SMILES, MOL, name
Types of search
Filter by MW range
Filter by document
section
![Page 15: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/15.jpg)
Example searches
• Retrieve all antimalarial small molecule US patents
– ic:C07D AND ic:A61P003306 AND pnctry:US
• Retrieve a specific patent
– pn:WO2011058149A1
• Similarity search (sildenafil nearest neighbours)
– Paste CCCc1nn(C)c2C(=O)NC(=Nc12)c3cc(ccc3OCC)S(=O)(=O)N4CCN(C)CC4
![Page 16: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/16.jpg)
Example search
![Page 17: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/17.jpg)
Review the hits
![Page 18: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/18.jpg)
Review the hits
![Page 19: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/19.jpg)
Select a subset of hits
![Page 20: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/20.jpg)
Export hits (Pro user)
Property range filters
Count filters
![Page 21: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/21.jpg)
Select a subset of hits
![Page 22: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/22.jpg)
Review patent documents
![Page 23: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/23.jpg)
Retrieve patent families
![Page 24: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/24.jpg)
Review patent documents
![Page 25: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/25.jpg)
Retrieve chemistry (Pro user)
Property range filters
Count filters
![Page 26: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/26.jpg)
Summary
• Searching capabilities
– Free text keywords and Lucene fields
– Patent IDs & bibliographic information
– Patent authority & date
– Structure
• Retrieving capabilities
– Retrieve chemistry (with additional filters)
– Retrieve patent family information
– Retrieve annotated full patent text
![Page 27: SureChem and ChEMBL ACS CINF webinar John P ......SureChEMBL Ligand structures from patent literature ~70M ChEMBL •The world’s largest primary public database of medicinal chemistry](https://reader034.vdocuments.pub/reader034/viewer/2022050519/5fa2a0b2ed2a803a3c62cfcc/html5/thumbnails/27.jpg)
Any questions?
• http://chembl.blogspot.co.uk/
• http://chembl.blogspot.co.uk/search/label/Webinar