Download - SoC Idling for unconf COSCUP 2016
Runtime PM for CPU Idling in Linux Kernel
“freedom” Koan-Sin Tanfreedom_at_computer.org
unconf, COSCUP 2016Most of the materials are from Kevin, Ulf, and Lina’s past Linaro Connect presentations
SoC Idling & CPU Cluster PM● Idle management of devices via Runtime
PM and the Generic PM Domain (genpd)○ A proven concept
● Idle management of CPUs and clusters (cpuidle)
● Goal: One “idle” to rule them all!
● http://lwn.net/Articles/696712/○ [PATCH v3 00/15] PM: SoC idle support using
PM domains, Thu, 4 Aug 2016 17:04:47 -0600
https://en.wikipedia.org/wiki/One_Ring
Background● Runtime PM● System PM● DevPM QoS
○ CPU_DMA_LATENCY○ https://www.kernel.org/doc/Documentation/power/pm_qos_interface.txt
● Generic PM domain
https://events.linuxfoundation.org/images/stories/pdf/lceu2012_wysocki.pdf
Users of genpd
SH-Mobile
SH-MobileS3C64xxExynos
SH-MobileS3C64xxExynos
SH-MobileS3C64xxExynosIMXUx500ZXQcomBCMDoveMediaTekRockchipTegra
Linux 3.1 Linux 3.4 Linux 3.18 Linux 4.5
Highlight● Genpd: maintained by Ulf, Kevin, and Rafael● Various consolidation, fixing issues and regressions
○ Genpd: Removing intermediate states from the power off sequence (4.3)■ http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/base?id=ba2bbfbf63075850bb523e
2adb815d45e3509995
○ Genpd: Enable the runtime PM centric approach○ Genpd: Support multiple retention states (4.6)
Next steps● Add genpd power statistics● Avoid needless wakeup in System PM
CPU Cluster PM: background and motivation
More
● CPUs● integrated devices● power domains● micro controllers● firmware
Kernel needs to evolve
Idle management today● The linux kernel has two distinct ways of managing idle. ● The CPUidle framework for CPUs and for all other devices: runtime PM
combined with generic power domains (genpd). In addition, CPUidle is not scaling well for multi-cluster SMP systems and heterogeneous systems like big.LITTLE
● To better manage idle for modern SoCs with a hierarchical structure, we are exploring extending runtime PM and genpd to CPUs so there is a unified framework for managing idle across all devices
Two separate worldsCPU
● cpuidle framework● cpu_(cluster_)pm_*()● Not scaling well for SMP or multi-cluster
(c.f. Coupled idle states)
[1] https://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/cpuidle-cluster.pdf
IO devices
● Runtime PM● Auto suspend● PM domains● Generic PM domains (genpd)
[1] https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_wysocki.pdf
Runtime PM + genpd SoC IdleIncluding
● User runtime PM for CPUS● And CPU-connected “stuff”
○ Interrupt controllers (e.g., ARM GIC)○ FPUs○ CPU-local cache (L1 $)
● Model cluster with genpd○ CPUs are just “devices” in the genpd○ Genpd includes share resources (e.g., L2 $)
SoC/Cluster Idle - Solution● Use genpd and Runtime PM● Describe CPUS and domains in DT
○ CPU: #power-domains = <&CPU_PD0>;
○ power-controller: #power-domains-cells;
● Initialize genpd PM domains● Attach CPU devices to genpd● Add Runtime PM support for CPUidle and Hotplug● Provide platform callbacks
CPU
cpuidlecpuidle
SoC Idle: Today
CPUCPUCPUCPU
CPUCPUCPUCPU
cpuidlecpuidle
cpuidlecpuidlecpuidlecpuidle
CLUSTER COHERENCY
?? ?
Platform hacks
CLUSTER & COHERENCY
SoC Idle: With CPU PM
CPU
cpuidlecpuidleCPUCPUCPUCPU
CPUCPUCPUCPU
cpuidlecpuidle
cpuidlecpuidlecpuidlecpuidle
Runtime PM GenPD CPU PM
Platform Driver
Recipe● Started with upstream kernel (latest kernel I tested is 4.7)● Prep
○ Extend genpd domains to support multiple idle levels○ Extend genpd to support IRQ-safe PM domains
● Add○ Add CPU PM domain framework○ Call Runtime PM from cpuidle
● Sample○ Using CPU PM domains: OS Initiated○ DT changes for SoC
https://github.com/freedomtan/linux/commits/v4.7-rc1-mt8173-soc-idle
CPU PM framework● Init
○ Read domain topology from DT○ Setup genpd PM domains
● Genpd○ Last man determination
● Genpd gov:○ Determine cluster idle state
GenPDPlatform Driver
CPU PM
GenPD
GenPD Governor
Init
CPU PM domains● Add CPU PM domain framework
○ Define CPUs as IRQ-safe devices ■ https://patches.linaro.org/patch/63350/
○ Read CPU topology from DT○ Register domains and set up sub-domains○ Add online CPUs to the respective CPU PM domains○ Register for hotplug notifications to understand online CPUs
■ https://patches.linaro.org/patch/63352/○ Add CPU Runtime PM API for ease of use○ Add a new genpd governor for CPU domains that takes PM QoS into consideration
● Call CPU Runtime PM from ARM cpuidle driver○ Use CPU PM domain runtime suspend and resume API○ Record the next wake time of this CPU in the device’s genpd data for use by governor
https://github.com/freedomtan/linux/blob/v4.7-rc1-mt8173-soc-idle/drivers/base/power/cpu_domains.c
CPU PM framework● Better than couple cpuidle
○ CPUs are not woken up when ready to enter coupled state○ Handle multi-level CPU-domain topology
● Is not MCPM○ MCPM handles low level race between CPUs○ Some v7 SoCs may still need MCPU with CPU PM framework
PSCI: PC vs. OSIPlatform Coordinated (PC)
● Default PSCI mode● FW decides on CPU-Domain idle state
when all CPUs are idle● FW does not know of LInux CPU QoS
requirements, latency and next CPU wakeup
● FW decision to power off domain may be detrimental to power and performance
OS Initiated (OSI)
● Options from PSCI v1.0 onward● Linux decides the CPU-Domain idle state● The last CPU in a domain provides the idle
state of the cluster, coherency● Linux can make a wise choice of
CPU-Domain idle state knowing QoS, latency, predicated CPU wakeup
OSI using CPU PM● Query PSCI_FEATURES in F/W for OSI support● Setup CPU PM Domains
○ Reads State-IDs for Cluster and Coherency idle levels from DT
● Callbacks for Domain ON/OFF○ State-IDs passed from CPU PM○ Aggregate State-IDs against the CPU
● CPU calls F/W with Composite State-ID● NO SoC specific drivers needed for ARM v8
PSCI: Composite State-ID● CPUidle: CPU state● CPU PM Gov: Cluster state● CPU PM Gov: Coherency state
cpu_suspend(uint32 power_state, ....)
Bit field Description
31:26 Reserved. Must be zero
25:24 PowerLevel
23:17 Reserved. Must be zero.
16 State Type
15:0 StateID
Bit field Description
31 Reserved. Must be zero
30 StateType
29:28 Reserved. Must be zero.
27:0 StateID
Original Format Extended StateID Format
Original Format
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Last in level System Cluster CPU
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
Reserved: must be 0 Power Level
Reserved: must be 0 State type
CPU idle: 0x00010000Cluster idle: 0x01010000
StateID: not used
Extended ID
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Last in level System Cluster CPU
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
0 Reserved: must be 0
State type
StateID
Changes for Vendors: ARM64● DT
○ Domain hierarchy○ Domain idle states
● Driver○ None if F/W supports OS Initiated mode
Changes for Vendors: ARMv7● DT
○ Domain hierarchy○ Domain idle states
● Driver○ Handle power ON/OFF callbacks○ MCPM or any race avoiding last man logic
● Revisit SoC specific CPUidle hacks
Genpd: IRQ-Safe Restriction● IRQ-safe domains can only have IRQ-safe sub-domains● Other cases with domains and devices remains the same
Status of patches● PM / Domains: Multiple genpd states: Merged● PM / Doamins: IRQ safe domains: Under review
○ http://www.spinics.net/lists/linux-arm-msm/msg19666.html
● CPU PM domains: new framework for CPU cluster: Under review○ http://www.spinics.net/lists/linux-arm-msm/msg19667.html
● PSCI 1.0 OS Initiated: Support for domain hierarchy: Under review○ http://www.spinics.net/lists/linux-arm-msm/msg19673.html○ http://www.spinics.net/lists/linux-arm-msm/msg19674.html
RFC submission on ML:[1]. Patch v3: http://lwn.net/Articles/696712/
Sample ARM v7a implementation[2]. https://git.linaro.org/people/lina.iyer/linux-next.git/shortlog/refs/heads/genpd-psci-8084
Results● SoC idle saves power when CPUs are online
○ Critical power saving comes from powering off caches and peripheral h/w○ ~20 mA @800 Mhz of power saving at idle
● ~5 μs addition to idle enter path○ Exit latency depends on what happens at domain OFF
● Implemented and tested on DB410c / 96Board
● Implemented and tested on MT8173 EVB and Chromebook OAK rev-5● Some result measured with Servo board
INFO: CPU Node : MPID 0x0, parent_node 1, State 0x0 INFO: CPU Node : MPID 0x1, parent_node 1, State 0x0 INFO: CPU Node : MPID 0xffffffffffffffff, parent_node 1, State 0x2 INFO: CPU Node : MPID 0xffffffffffffffff, parent_node 1, State 0x2 INFO: CPU Node : MPID 0x100, parent_node 2, State 0x0 INFO: CPU Node : MPID 0x101, parent_node 2, State 0x0 INFO: mode = 0x1
What changes are needed● Linux kernel
○ Runtime PM○ Generic PM domain (genpd)○ Some patches (likely to be merged into mainline kernel when good enough)○ dts changes
● PSCI (ATF)○ OS Initiated: not supported in ATF○ Extended ID: enable it
○ By the time I started working on it, the code in plat/mediatek/ still uses backward compatible
API (ENABLE_PLAT_COMPAT)■ Cannot enable Extended ID
Links● Linux Kernel
○ genpd + runtime pm SoC idle■ Patches V3, http://lwn.net/Articles/696712/
○ Ulfh’s linux-pm next branch■ https://git.linaro.org/people/ulf.hansson/linux-pm.git■ https://git.linaro.org/people/ulf.hansson/linux-pm.git/shortlog/refs/heads/next
○ My dts changes for MT8173 (based on Ulf’s next branch)
■ https://git.linaro.org/people/freedom.tan/linux-8173.git/shortlog/refs/heads/v4.6-rc6-mt817
3-soc-idle
● ARM Trusted Firmware (ATF)○ https://github.com/freedomtan/arm-trusted-firmware/tree/hacks-for-8173evb-osi-0518
1. PM / Domains: Abstract genpd locking2. PM / Domains: Support IRQ safe PM domains3. PM / cpu_domains: Setup PM domains for CPUs/clusters4. ARM: cpuidle: Add runtime PM support for CPUs5. timer: Export next wake up of a CPU6. PM / cpu_domains: Record CPUs that are part of the domain7. PM / cpu_domains: Add PM Domain governor for CPUs8. Documentation / cpu_domains: Describe CPU PM domains
setup and governor9. drivers: firmware: psci: Allow OS Initiated suspend mode
10. ARM64: psci: Support cluster idle states for OS-Initiated11. ARM64: dts: Add PSCI cpuidle support for MSM891612. ARM64: dts: Define CPU power domain for MSM8916
Documentation/power/cpu_domains.txt | 79 ++++++++ Documentation/power/devices.txt | 12 +- arch/arm64/boot/dts/qcom/msm8916.dtsi | 49 +++++ arch/arm64/kernel/psci.c | 46 ++++- drivers/base/power/Makefile | 1 + drivers/base/power/cpu_domains.c | 365 ++++++++++++++++++++++++++++++++++ drivers/base/power/domain.c | 210 +++++++++++++++---- drivers/cpuidle/cpuidle-arm.c | 55 +++++ drivers/firmware/psci.c | 45 ++++- include/linux/cpu_domains.h | 37 ++++ include/linux/pm_domain.h | 14 +- include/linux/psci.h | 2 + include/linux/tick.h | 10 + include/uapi/linux/psci.h | 5 + kernel/time/tick-sched.c | 13 ++ 15 files changed, 896 insertions(+), 47 deletions(-) create mode 100644 Documentation/power/cpu_domains.txt create mode 100644 drivers/base/power/cpu_domains.c create mode 100644 include/linux/cpu_domains.h
Description of Patches● Patches [1, 2] - Genpd changes. Sets up Generic PM domains
to be called from cpuidle. Genpd uses mutexes for synchronization. This has to be changed to spinlocks for domains that may be called from IRQ safe contexts.
● Patch [3] - CPU PM domains. Parses DT and sets up PM domains for CPUs and builds up the hierarchy of domains and devices. These are bunch of helper functions.
● Patch [4] - ARM cpuidle driver. Enable ARM cpuidle driver to call runtime PM. Even though this has been done for ARM, there is nothing architecture specific about this. Currently, all idle states other than ARM clock gating calls into runtime PM. This could also be state specific i.e call into runtime PM only after a certain state. The changes may be made part of cpuidle framework, but needs discussion.
● Patches [5, 6, 7] - PM domain governor for CPUs.Introduces a new genpd governor that looks into the per-CPU tick device to identify the next CPU wakeup and determine the available sleep time for the domain. This along with QoS is used to determine the best idle state for the domain. A domains' wake up is determined by the first CPU in that domain to wake up. A coherency level domain's (parent of a domain containing CPU devices)wake up is determined by the first CPU amongst all the CPUs to wake up. Identifying the CPUs and their wakeups are the part of these patches.
● Patches [9, 10] - ARM64 PSCI platform driverARM64 PSCI v1.0 specific. PSCI OS initiated mode. supports powering off CPU clusters (caches etc, by configuring separate power controllers). These patchesenable Linux to determine if the f/w supports this mode and if so, uses CPU PM domain helper functions to create PM domains and handles the power_on/power_offcallbacks. The resulting cluster state is passed as an argument to the f/w along with the CPU state.
RFCv3 patches [1] https://patches.linaro.org/patch/63349/ [2] https://patches.linaro.org/patch/63350/ [3] https://patches.linaro.org/patch/63351/ [4] https://patches.linaro.org/patch/63352/ [5] https://patches.linaro.org/patch/63353/ [6] https://patches.linaro.org/patch/63354/ [7] https://patches.linaro.org/patch/63355/ [8] https://patches.linaro.org/patch/63356/ [9] https://patches.linaro.org/patch/63357/[10] https://patches.linaro.org/patch/63358/[11] https://patches.linaro.org/patch/63359/[12] https://patches.linaro.org/patch/63360/