CVE-2024-53169
Vulnerability Scoring
Status: Received on 27 Dec 2024, 14:15 UTC
Published on: 27 Dec 2024, 14:15 UTC
CVSS Release:
CVE-2024-53169: In the Linux kernel, the following vulnerability has been resolved: nvme-fabrics: fix kernel crash while shutting down controller The nvme keep-alive operation, which executes at a periodic interval, could potentially sneak in while shutting down a fabric controller. This may lead to a race between the fabric controller admin queue destroy code path (invoked while shutting down controller) and hw/hctx queue dispatcher called from the nvme keep-alive async request queuing operation. This race could lead to the kernel crash shown below: Call Trace: autoremove_wake_function+0x0/0xbc (unreliable) __blk_mq_sched_dispatch_requests+0x114/0x24c blk_mq_sched_dispatch_requests+0x44/0x84 blk_mq_run_hw_queue+0x140/0x220 nvme_keep_alive_work+0xc8/0x19c [nvme_core] process_one_work+0x200/0x4e0 worker_thread+0x340/0x504 kthread+0x138/0x140 start_kernel_thread+0x14/0x18 While shutting down fabric controller, if nvme keep-alive request sneaks in then it would be flushed off. The nvme_keep_alive_end_io function is then invoked to handle the end of the keep-alive operation which decrements the admin->q_usage_counter and assuming this is the last/only request in the admin queue then the admin->q_usage_counter becomes zero. If that happens then blk-mq destroy queue operation (blk_mq_destroy_ queue()) which could be potentially running simultaneously on another cpu (as this is the controller shutdown code path) would forward progress and deletes the admin queue. So, now from this point onward we are not supposed to access the admin queue resources. However the issue here's that the nvme keep-alive thread running hw/hctx queue dispatch operation hasn't yet finished its work and so it could still potentially access the admin queue resource while the admin queue had been already deleted and that causes the above crash. The above kernel crash is regression caused due to changes implemented in commit a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"). Ideally we should stop keep-alive before destroyin g the admin queue and freeing the admin tagset so that it wouldn't sneak in during the shutdown operation. However we removed the keep alive stop operation from the beginning of the controller shutdown code path in commit a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()") and added it under nvme_uninit_ctrl() which executes very late in the shutdown code path after the admin queue is destroyed and its tagset is removed. So this change created the possibility of keep-alive sneaking in and interfering with the shutdown operation and causing observed kernel crash. To fix the observed crash, we decided to move nvme_stop_keep_alive() from nvme_uninit_ctrl() to nvme_remove_admin_tag_set(). This change would ensure that we don't forward progress and delete the admin queue until the keep- alive operation is finished (if it's in-flight) or cancelled and that would help contain the race condition explained above and hence avoid the crash. Moving nvme_stop_keep_alive() to nvme_remove_admin_tag_set() instead of adding nvme_stop_keep_alive() to the beginning of the controller shutdown code path in nvme_stop_ctrl(), as was the case earlier before commit a54a93d0e359 ("nvme: move stopping keep-alive into nvme_uninit_ctrl()"), would help save one callsite of nvme_stop_keep_alive().
The exploitability of CVE-2024-53169 depends on two key factors: attack complexity (the level of effort required to execute an exploit) and privileges required (the access level an attacker needs).
No exploitability data is available for CVE-2024-53169.
A lower complexity and fewer privilege requirements make exploitation easier. Security teams should evaluate these aspects to determine the urgency of mitigation strategies, such as patch management and access control policies.
Attack Complexity (AC) measures the difficulty in executing an exploit. A high AC means that specific conditions must be met, making an attack more challenging, while a low AC means the vulnerability can be exploited with minimal effort.
Privileges Required (PR) determine the level of system access necessary for an attack. Vulnerabilities requiring no privileges are more accessible to attackers, whereas high privilege requirements limit exploitation to authorized users with elevated access.
Above is the CVSS Sub-score Breakdown for CVE-2024-53169, illustrating how Base, Impact, and Exploitability factors combine to form the overall severity rating. A higher sub-score typically indicates a more severe or easier-to-exploit vulnerability.
Below is the Impact Analysis for CVE-2024-53169, showing how Confidentiality, Integrity, and Availability might be affected if the vulnerability is exploited. Higher values usually signal greater potential damage.
The EPSS score estimates the probability that this vulnerability will be exploited in the near future.
EPSS Score: 0.045% (probability of exploit)
EPSS Percentile: 18.4%
(lower percentile = lower relative risk)
This vulnerability is less risky than approximately 81.6% of others.
Unknown
Stay updated with real-time CVE vulnerabilities and take action to secure your systems. Enhance your cybersecurity posture with the latest threat intelligence and mitigation techniques. Develop the skills necessary to defend against CVEs and secure critical infrastructures. Join the top cybersecurity professionals safeguarding today's infrastructures.