s1: innodb aio原理及相关bug分析

InnoDB AIO 原理及相关bug分析

淘宝希羽

议程

• InnoDB AIO参数设置

• InnoDB模拟AIO原理

• 读写线程如何调用及诊断

• 主线程帮什么忙

• DDL丢表问题分析及解决

InnoDB AIO 参数设置

• innodb_file_io_threads – 自从5.1 plugin 和5.5版本被舍弃 – built-in版本默认值为4，意味着:

• innodb_read_io_threads=1 • innodb_write_io_threads=1 • 一个insert buffer线程 • 一个log线程

• 在SSD 环境, 拥有更强的IO能力,典型设置： – innodb_thread_concurrency=64 – innodb_read_io_threads=8 – innodb_write_io_threads=8 – innodb_io_capacity=2000

http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html

InnoDB IO 工作流程

InnoDB 模拟AIO: 初始化

• srv/srv0start.c: innobase_start_or_create_for_mysql

InnoDB模拟AIO: 关键的数据结构

• struct os_aio_array_t，4个实例

– mutex

– not_full/is_empty (os_event_t)

– n_slots/n_segments/n_reserved

– slots (os_aio_slot_t)

• struct os_aio_slot_t

n_slot=n_[read|write]_segs*n_per_seg

slot[0] pos|reserved|..

… … slot[n_slot]

InnoDB模拟AIO: 工作线程和唤醒线程句柄

• 工作线程句柄 fil/fil0fil.c:fil_aio_wait

– os_aio_simulated_handle

• 唤醒AIO线程句柄(多处调用)

– os_aio_simulated_wake_handler_thread

– 典型场景：找不到可用slot时强制唤醒 slot[0] slot[1] …slot[k]… slot[n-1]

if found slot that needs read/write io, call os_file_read/write

else wait for event

broadcast in wake_handler

InnoDB 模拟AIO: 核心函数os_aio_simulated_handle

- 获取segment slots - 如果slot被保留并且io_already_done, 那么goto

slot_io_done(释放slot及io_already_done); - 如果任何slot被保留时间>2s, 那么选择最老的以防止饥饿; - 如果没有找到上述条件的slot, 选择被保留且offset最小的slot. - 上述两个条件均未找到slot(被保留的),则 goto wait_for_io; - 否则必然到到一个slot,再继续找到slot’被保留且与之前找到

slot有连续的IO,再找到与slot’有如此关系的slot’’ …(找到64个) - memcpy 上述找到的 slot的buf到 combined_buf - 调用os_file_read/os_file_write来完成读/写. 可能的优化余地:增大slot及批量写的slot数目 Native AIO slot数目为AIO的1/8, 调用os_aio_linux_handle

读写线程如何调用及诊断: 一切以调用fil_io为开始

• fil_io – fil_mutex_enter_and_prepare_for_io

– fil_node_prepare_for_io

– os_aio • os_aio_simulated_handle

– os_file_pwrite (lseek && write && [flush])

– fil_node_complete_io

• fil_flush

• 读同步,写异步

读写线程如何调用及诊断：文件IO的诊断信息

• fil_node_prepare_for_io – 从fil_system->LRU将node移走(被占用) – n_pending++

• fil_node_complete_io – n_pending-- – modification_counter++ (set to flush_counter when freed) – 将node->space 加到fil_system->unflushed_spaces – 将node回到fil_system->LRU

• fil_flush – space->n_pending_flushes++ – n_pending_flushes++ – os_file_flush – n_pending_flushes— – space->n_pending_flushes--

主线程帮什么忙

• 调用buf_flush_batch来刷肮页和唤醒AIO

– 以不同的负荷来刷脏页(以当前IO的繁忙程度来分配IO能力)

– 从flush_list刷页块(相邻页也被刷)

• buf_flush_list->buf_flush_batch(BUF_FLUSH_LIST)

– 最终调用 buf_flush_buffered_writes

• 从缓存中的doublewrite 刷可能的buffer到存储

• 如果使用模拟AIO,则要唤醒AIO线程

flush_list和LRU_list

• 两种刷页方式的特点

– 所有页先从buffer pool刷到缓存中的doublewrite

• 主线程周期性触发(修改页比例 > 脏页比例), 从flush_list中刷

• 工作线程主动刷(没有可用的空闲blocks),从LRU_list中刷

– doublewrite组成: 128 pages, each 16K, all 2M

– 始于buf_flush_page: 将 flushable 页从buffer pool 写到某个文件

buf_flush_page buf_flush_write_block_low buf_flush_post_to_doublewrite_buf

fil_io (no trx_dwb)

buf_flush_buffer_writes

DDL表丢失问题背景

• #62100

• DDL失败后表丢失

• 过去一年在线上操作经历5次

• 自2008年就在buglist中存在

• 2011年下提交patch

• 2012年初Mark&Inaam讨论改进patch

• MySQL 5.5.22融入patch

http://bugs.mysql.com/bug.php?id=62100

DDL 丢表分析：基本信息

• fil_rename_tablespace

– 设置stop_ios=TURE

– 等待直到n_pending==0 && n_pending_flushes==0

– 设置stop_ios=FALSE

• fil_mutex_enter_and_prepare_for_io

– 等待直到stop_ios = FALSE

• 被阻塞直到超时, mysqld最终abort

DDL丢表分析: 关键的backtrace

DDL丢表分析: 自问自答

• 为什么n_pending > 0? – 没有活动的写线程

• 没有写线程没有被唤醒? – 被阻塞在doublewrite->mutex

• 谁拥有doublewrite->mutex – srv_master 线程拥有但被阻塞

• 为什么DDL异常下不能正确回滚? – 没有正确设置参数以告知调用者

• 其它风险? – 对DDL操作的中间临时表的写是否延时

DDL丢表解决方法

• 在等待时间过长时强制唤醒AIO线程

• 正确设置回滚的标记

• 修正max_open_files的判断逻辑位置

最后

• 讨论时间

s1: innodb aio原理及相关bug分析

Technology