oracle performance on linux x86 systems
TRANSCRIPT
Linux Performance For Oracle On X86 systems Baruch Osoveskiy Senior Consultant Brillix
Who Am I
• Senior Consultant in brillix • Unix-Linux System Admin since 1997• DBA oracle and MySQL since 2000 • Linux and security Consultant• blogger in ildba.co.il
• Enterprise Distributions• Kernel • Packages• Drivers
The Software
◦Use Unbreakable Enterprise Kernel (UEK), Redhat , Suse Whey ? Better optimized for large systems and workloads Better hardware support Modern Linux features Patches required for correct Oracle product operation
Distribution
• Bug Fixes from upstream• Most Bug Fixes originate Upstream and are backported to Enterprise Distributions
• More code change from upstream means more time before patches are backported -- if it’s even possible to do so
• More time for security patches to be backported to the Enterprise versions
• Bug Fixes in Upstream apply cleanly to UEK
• Better testing• Code is tested by the whole Linux community, not dependent on one OS vendor and their customers
• You run the same code that’s in upstream
• No backporting/scaffolding to use the latest Linux kernel features (i.e. NFSv4, TCP fast open, etc)
• Better contributions• Largest amount of Developers and Company Contributions
• Major Backports not required to provide cutting edge features
• New features seamlessly used by Oracle products
Stay closer to mainline Linux
• Updates contain critical security and bug fixes.• Most Enterprise Distribution updates contain new features.• “Security Errata” can contain new features• Updates often contain 1000s of lines of new code from upstream.• Includes features, bug fixes, enhancements and other tweaks• Install Only security and bug fixes to avoid down time from new and
untested features • The DB server is not your Laptop do not install unknown/new software.
How to USE Security Errata
• Use Oracle validated • http://
www.oracle.com/technetwork/server-storage/linux/validated-configurations-085828.html
• Install only base + Oracle validated packages • Do not install games, applications .. on the production servers
Software Install
#cd /etc/yum.repos.d
# wget http://public-yum.oracle.com/public-yum-ol6.repo
# yum list
#yum install oracle-rdbms-server-11gR2-preinstall
Oracle public yum
• BIOS• CPU • Memory TYPE , swap• Disk (Disks in the more the better )• Virtualization
The Hardware
MotherboardHandle 0x0003, DMI type 2, 16 bytes Base Board InformationManufacturer: IntelProduct Name: S5000PAL0ProcessorProcessor InformationVersion: Intel(R) Xeon(R) CPU X5355MemoryHandle 0x0034, DMI type 17, 27 bytesMemory DeviceData Width: 64 bitsSize: 2048 MBForm Factor: DIMMSet: 1Locator: ONBOARD DIMM_A1Bank Locator: Not SpecifiedType: DDR2Type Detail: SynchronousSpeed: 667 MHz (1.5 ns)
DMIDECODE• Motherboard4 Memory Channels (S5000PAL0) 8 Slots(A1/A2/B1/B2/D1/D2)Channels• CPUIntel ClovertownCPUs1333Mhz (Dual Independent FSB)Bandwidth 10666 MB/s per FSB21 GB/s Maximum FSB Bandwdith• MemoryMemory DDR2 667 = PC2-53004 Memory Channels at 5.3GB/s eachMemory Bandwidth of 21 GB/s from all 4 channels16GB memory in totalhttp://ark.intel.com/
• Always check for appropriate BIOS settings Look out for: • CPU features
• Enable Maximum Performance in the BIOS
• Memory• Enable numa
• Power Management
Bios settings
• will give you 35% better performance ( Test On OLTP).• SMT Simultaneous Multi-Threading
• Run 2 threads at the same time per core
• Do I have HT ? • Ensuring that HT is enabled at the BIOS.
• grep -e "model name" /proc/cpuinfo
• http://ark.intel.com/
• Do not Enable TH on I/O bound server it only will make it worse.
CPUHyper-threading
cpufreq
you can dynamically scale processor frequencies through the CPUfreq subsystem. ◦ Enable Maximum Performance in the BIOS
◦ /sys/devices/system/cpu/cpu<n>/cpufreq/scaling_governor
◦ On Redhat 5.x default is performance
◦ On Redhat 6.x default is normal ◦ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
CPUcpufreq on kernel 2.6.x
sudo modprobe cpufreq_conservative
sudo modprobe cpufreq_ondemand
sudo modprobe cpufreq_powersave
sudo modprobe cpufreq_stats
sudo modprobe cpufreq_userspace
/etc/init.d/cpuspeed status
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
grep -i mhz /proc/cpuinfo
cpu MHz : 2367.330
CPUcpufreq on kernel 2.6.x
• Load of more then n on n cpu server is bad • Load of n cpu on n cpu server is good
Load
= load of 1.00
= load of 0.50
= load of 1.70
• Use Load to find if the server is CPU bound• one, five, and fifteen minute averages • In LINUX CPU bound can impact I/O
CPU bound
cat /proc/meminfo◦ Free = Cached + Free
◦ All free space on Linux is used for pagecache.
◦ This behavior can be controlled by cgroups.
◦ PageTables large use HugePages. /dev/shm
◦ implementation of traditional shared memory (ramfs )
◦ Used by Automatic Memory Management ( AMM MEMORY_TARGET )
◦ Not working with Hugepages ID 1134002.1
Memory
• Oracle will recognize NUMA systems and adjust memory and scheduling operations accordingly and NUMA technology allows for faster communication between distributed memory in a multi-processor server. ID 759565.1
• !! Disabling or enabling NUMA Will change application performance. !!• 8 sockets and beyond may see gains of approximately 5% • Enbale on bios level and in grub.conf remove numa=off• _enable_NUMA_optimization=TRUE (look for bugs on your version before enable)• dmesg | grep -i numa
NUMA: Initialized distance table, cnt=2
NUMA: Node 0 [0,c0000000) + [100000000,1040000000) -> [0,1040000000)
pci_bus 0000:00: on NUMA node 0 (pxm 0)
pci_bus 0000:80: on NUMA node 1 (pxm 1)
Non-Uniform Memory Access (NUMA)
• Without HugePages the memory of the is divided into 4K pages• Using HugePages the page size is increased to 2MB (configurable to 1G )• HugePages reducing the total number of pages to be managed by the kernel • reducing the amount of memory required to hold the page table in memory.
Hugepages:
• Use Hugepages Oracle Doc ID 749851.1• Reduce footprint of individual Oracle database connections.• Increase performance and scalability with fewer tlb misses.• Requires manual tuning after SGA changes, and does not work with AMM
(/dev/shm).
Hugepages:
Hugepages benchmark from Greg Marsden oracle
Without Hugepages
o 200 Connections to a 12.9GB SGA
o Before DB Startup Pagetables: 7400 kB
o After DB Startup Pagetables: 652900 kB
o After 200 PQ Slave run queryo Pagetables: 6189248ko Time to complete: 00:10:23.60
With Hugepages
o 200 Connections to a 12.9GB SGA
o Before DB Startup PageTables: 7748 kB
o After DB Startup Pagetables: 21288 kB
o After 200 PQ slaves run queryo Pagetables: 80564 kBo Time to complete: 00:00:18.77
Use Hugepages with VMs for non-swappable, shared pagetables. Hugepages must allocated in the guest VM and the hypervisor Oracle VM 3.2.6 contains support for pv-hugepages
Hugepages: Virtualization
What about Swap?◦ Modern Linux distributions Do not use Swap (swappiness is very low )
◦ Swap is for OS services only. I do not recommend swap = ram.
◦ Check vmstat output: ensure swap
◦ Do not use Swap as memory – buy more memory
◦ If you have free memory
echo 10 > /proc/sys/vm/swappiness vm.swappiness in /etc/sysctl.conf
Swap
• Disks – the more the better • Do not mix.• Use RAID • Use Hardware RAID • RAID 1+0 is best for write performance (logs).• RAID 5 is best for read performance.
I/O (Disk)
RAID Level Total array capacity Fault tolerance Read speed (4k) Write speed (4k)
RAID-1+0500GB x 4 disks 1000 GB 1 disk 2X 2X
RAID-5500GB x 3 disks 1000 GB 1 disk 3X
Speed of a RAID 5 depends on controller
• High “log file sync” event time . • Do Not Use Raid 5 on Redo Logs (low write performance).• Upgrading the CPU enabled more throughput increase for redo (LGWR also requires CPU)• reducing the overall number of commits by batching transactions can have a very beneficial effect.• See if any of the processing can use the COMMIT NOWAIT option.• See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options.• Enlarge the redologs so the logs switch between 15 to 20 minutes.• ID 34592.1
I/O Redo Logs
On Linux Use ASM (Block/RAW Device, O_DIRECT ) Raw Devices deprecated by OUI for Oracle 11.2 ID 357492.1 Raw Devices may still bring benefits for intensive redo and large redo log
files Use udev or asmlib to Control Devices
I/O Best Practice ASM
If using file system Bypass journaling when you create a file system., use EXT-2 or EXT-4 with journaling turned off,
journaling turned off eliminates double writes. “noatime” option eliminates the need for the system to create writes to the file system
when objects are only being read. To Creaet partition and to disable DOS compatibility
fdisk -c -u /dev/sda1 To turn off journaling, execute:
tune4fs -O ^has_journal /dev/sda1
mount -t ext4 -o noatime /dev/sda1 /oradata
I/OData File on File System
Device w/s wMB/s avgrq-sz avqqu-sz avwait svctm %utilsdb1 21357.33 167.86 16.10 1.51 0.07 0.02 44.53
SSD Benchmark from intel 8 disk
Device w/s wMB/s avgrq-sz avqqu-sz avwait svctm %utilsdd1 3343.00 130.68 80.06 3.25 0.97 0.25 83.97
SSD
HDD
iostat information recorded during the ASM tests SSD/RAW, HDD/RAW, 50GB over a 5 minute period
the redo on the 8 x SSD drives is writing 1.28X more data per second and doing 6.4X the writes/second although the avgrq-sz shows that the HDD configuration is writing more data for each operation. However, the avwait, svctm and %util show the the HDD configuration is busier and responding slower.
• Top 5 Timed Events (AWR) looked as follows:
SSD Benchmark
Event Waits Time(s) Avg wait (ms) % DB time Wait ClassDB CPU 19,832 78.42log file sync 6,700,242 4,059 1 16.05 Commit
Event Waits Time(s) Avg wait (ms) % DB time Wait ClassDB CPU 14,255 52.53log file sync 5,366,376 12,709 2 46.83 Commit
SSD
HDD
• DB Smart Flash Cache is new (11.2) extension for buffer cache area. • extension to the SGA as L2 cache
ID 1317950.1
db_flash_cache_file = <+FLASH/filename>
db_flash_cache_size = <flash pool size>
alter [table|index object_name] storage (flash_cache keep);
db_flash_cache_file
calibrate_io
• Look for high I/O wait (%wa in top, await iostat)• Look at %util for disk saturation. • In the AWR most of DB Time is I/O.
I/O bound
Virtualization performance is proportional to native performance VM Drivers Vs Native Drivers Have ~16% Overhead
Virtualization notes
• top• iostat –Nx 1 100• Sar • Ksar • Oracle Orion Calibration Tool
http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#PFGRF95244
Performance tools
• From Redhat 6.x (6.2 best) and EUK 3 • cgroups: Control Groups for Linux Containers• Provide fine grained control over system resources• Can be used to throttle page cache use by backup
processes - Often the reason why systems are slower after overnight backups
cgroup
cgroup
Cgroup How To Use yum install libcgroup /etc/init.d/cgconfig start/etc/cgconfig.confmount {cpu = /cgroup/cpu;memory = /cgroup/memory;}group http { memory { memory.limit_in_bytes = 10M; }