kmbt c284e-20140426173725€¦ · quan es va fer càrrec del primer equip, el tito va pronunciar...

30
13 June 2019 © MARKLOGIC CORPORATION Data Hub Performance Optimization JAMES CLIPPINGER VP Strategic Accounts ERIN MILLER Senior Manager, Performance and Reliability Engineering

Upload: others

Post on 04-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

2019 MarkLogic Main ToolkitData Hub Performance Optimization
ERIN MILLER Senior Manager, Performance and
Reliability Engineering
Code or Infrastructure?
- Finding infrastructure problems
Agenda
If there is no bottleneck:
- Increase workload
- Isolate workload
Yes, upgrading can be a pain, but:
- Performance issues are getting fixed all the time
- Metrics and performance monitoring are improving all the time
Do you want to spend time chasing a problem that’s been fixed or that could be detected?
Don’t kick it old school
Five key resources used by MarkLogic
- Disk bandwidth, disk space, CPU, RAM, network bandwidth
Resource needs of ingest and harmonization generally easy to predict based on design
Can infrastructure meet the resource needs? Test and do the math.
Don’t be surprised by ingest, harmonization, or reindexing requirements
Reasonable performance expectations
Recent releases of MarkLogic tell you much more about infrastructure performance issues
- “Slow” messages: write, read, fsync, and background
- “Memory” messages and warnings
- “Hung” messages
Check ErrorLog.txt
2018-12-12 09:00:14.510 Info: Forest f1 state changed from open to error
2018-12-12 09:00:14.510 Info: Database DB-1 is offline
2018-12-12 09:00:14.512 Alert: XDMP-FORESTERR: Error in merge of forest f1: XDMP-MERGESPACE: Not merging due to disk space limitations,need=17039MB, have=14696MB
Analyzing logs
2017-07-06 12:11:59.734 Warning: Slow utime /var/opt/MarkLogic/Forests/doc-stress-F10/Label, 2.006 sec
Analyzing logs
….
- Cloud: check expected bandwidth
- On prem: test using fio to establish storage performance baseline (tough for shared)
CPU
Resource bottlenecks visible in Meters
Crank up the workload
- Isolate workloads: ingest, harmonize, and user queries separately
If request rate does not go up and there’s no bottleneck, contact Support
If there is no bottleneck
MarkLogic Data Hub Platform
What are your SLAs?
But the ingest is so fast! Managing expectations
It’s just a black box—how do I figure this out?!
Challenge: Harmonization is slow
USE methodology (Gregg, http://www.brendangregg.com/usemethod.html)
- Utilization
- Saturation
- Errors
- Disk space/storage; IOPS; CPU; RAM
Code or Infrastructure?
- New in 9.0-7
- https://docs.marklogic.com/guide/performance/request_monitoring
Challenge: Slow transformations
I just want to capture stats about all requests that are > 1 second
Configure request on the data-hub-FINAL and STAGING app servers
- In the root directory of the data-hub-MODULES database, place a .api file that contains info about the metrics you want to capture and the constraints (if any)
Request Monitoring with thresholds
Monitoring on a DHF endpoint: Output
In DHF 4.x, harmonization code is in plugins
- Main, Collector, Content, Header, Triples all run in Query mode
- Writer runs in Update mode
Thought experiment—what happens if I do a search in my Writer plugin?
Good news: in DHF 5.0, we give you better guardrails!
DHF and transactions
My writer plugin
Characterizing the problem
- Lots of possible reasons: data growth over time, resource bottlenecks, optimized code
- When did it start? What’s the pattern?
Different ingest/harmonization flows impacting database search performance
“When I run ingest and transform, my search application slows down”
Challenge: Production application slows down over time
This assumes that you’ve followed USE and Clip’s suggestions and found an infrastructure bottleneck
Look for hockey stick—try to provision more infrastructure resources before you get there
- What to expand when you’re expanding?
- IOPS, CPU, RAM? Forests/hosts?
Solution: Cluster Expansion
Remember, all your data uses resources—memory and storage, impacts on search, term list sizes, etc. If you’re not using the data and don’t need it, archive
Use Tiered Storage for less frequently accessed data
- HDFS
- S3 and Azure BLOB storage
- Even if you are an on-prem customer, this is a cheap and effective storage mechanism
Solution: Archive strategy
Keep requests independent
- Isolate your workload by request. DHF does this for you
Watch your locks
Avoid unnecessary bottlenecks
Putting it together: writing scalable code
Limit the request’s resources
- Using SQL or SPARQL? Use cts or Optic search clauses to limit scope
- Big result set? Paginate
- Don’t write queries where result set grows with size of data
- i.e., give me all the trades in the database—what happens as DB grows?
- If you need to do this, batch!
Putting it together: writing scalable code
Realistically, bottlenecks are often a combination of un-optimized code and resource limits
To figure out what’s what, use Utilization Saturation Errors methodology
Best process to efficiently scale:
- First, optimize your code as best you can
- Then, look at expansion—add RAM, add hosts, scale out and/or up
Putting it together: USE to resolve bottlenecks
[email protected]. Really. Email Support. You can email us: [email protected] and [email protected], but Support is monitored 24/7
Trying to figure out what those logs mean? https://help.marklogic.com/Knowledgebase/
Oh look! Erin wrote a whitepaper about this:
- https://www.marklogic.com/resources/performance-testing-marklogic/
Reasonable performance expectations
Slide Number 11
Slide Number 12
MarkLogic Data Hub Platform
Challenge: Harmonization is slow
DHF and transactions
My writer plugin
Solution: Cluster Expansion
Solution: Archive strategy
Putting it together: USE to resolve bottlenecks
Resources