Mysql鎖超時&死鎖定位及解決

語言: CN / TW / HK

一起養成寫作習慣!這是我參與「掘金日新計劃 · 4 月更文挑戰」的第24天,點選檢視活動詳情

前言

除了slow sql的處理,常見的還有鎖超時和死鎖的定位。

鎖超時一般發生在兩個update對同一行或者存在間隙鎖的交集情況下,在業務中一個事務持有鎖之後因為發生了IO阻塞或者等待或者處理其他邏輯耗時導致事務一直沒有被提交,鎖得不到釋放。

死鎖時一般發生在兩個事務分別先後持有兩個不同的鎖。比如事務1先持有A鎖,事務2先持有B鎖,事務1後準備持有B鎖,事務2後持有A鎖,那麼這兩個事務後面打算持有的鎖都被對方先持有了,那麼就出現了死鎖。

下面我們將對鎖超時和死鎖解釋下怎麼定位和解決,mysql版本為: ``` mysql> select version();

+-----------+

| version() |

+-----------+

| 8.0.27    |

+-----------+

1 row in set (0.00 sec) ```

測試表結構為: | Table | Create Table | +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '主鍵', `name` char(32) NOT NULL COMMENT '使用者姓名', `num` int DEFAULT NULL, `phone` char(11) DEFAULT '' COMMENT '手機號', PRIMARY KEY (`id`), KEY `idx_name_phone` (`name`,`phone`) ) ENGINE=InnoDB AUTO_INCREMENT=106 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='test表' | +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.04 sec) | Table | Create Table | +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | user | CREATE TABLE `user` ( `id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '主鍵', `create_time` datetime(3) DEFAULT CURRENT_TIMESTAMP(3) COMMENT '建立時間', `update_time` datetime(3) DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3) COMMENT '更新時間', `is_delete` tinyint NOT NULL DEFAULT '0' COMMENT '是否刪除 0:否 1:是', `phone` char(11) DEFAULT '' COMMENT '手機號', `user_name` char(64) NOT NULL COMMENT '使用者名稱', `head` varchar(255) DEFAULT NULL COMMENT '使用者頭像', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='使用者表' | +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

提前需要了解的

關於Mysql InnoDB的各類鎖請參考《Mysql InnoDB之各類鎖》,裡面有介紹怎麼檢視持有鎖和請求鎖的情況可以使用select * from performance_schema.data_locks進行檢視這裡就不做過多描述。檢視有哪些執行緒正在執行:show processlist。檢視當前執行的所有事務:select * from information_schema.INNODB_TRX

performance_schema:你可以查詢performance_schema資料庫中的表,以檢視有關伺服器及其執行的應用程式的效能特徵的實時資訊。有關詳細資訊,請參閱 第 27 章,MySQL 效能模式

information_schema:您可以使用InnoDB information_schema表提取有關InnoDB管理的架構物件的元資料。

鎖超時

事務1: mysql> begin;update test set phone='123' where id=1; Query OK, 0 rows affected (0.00 sec) 事務2: ``` mysql> begin;update test set phone='123' where id=1; Query OK, 0 rows affected (0.00 sec)

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction ```

show processlist: mysql> show processlist; +----+-----------------+-----------+---------+---------+-------+------------------------+----------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+-----------------+-----------+---------+---------+-------+------------------------+----------------------------------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 13118 | Waiting on empty queue | NULL | | 9 | root | localhost | my_test | Sleep | 197 | | NULL | | 10 | root | localhost | my_test | Query | 6 | updating | update test set phone='123' where id=1 | | 11 | root | localhost | my_test | Query | 0 | init | show processlist | +----+-----------------+-----------+---------+---------+-------+------------------------+----------------------------------------+ 4 rows in set (0.00 sec) 可以看到執行緒id為10的處於update阻塞中,同時info展示了正在阻塞的語句。

information_schema.INNODB_TRX: mysql> select * from information_schema.INNODB_TRX; +--------+-----------+---------------------+------------------------------+---------------------+------------+---------------------+----------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+---------------------+ | trx_id | trx_state | trx_started | trx_requested_lock_id | trx_wait_started | trx_weight | trx_mysql_thread_id | trx_query | trx_operation_state | trx_tables_in_use | trx_tables_locked | trx_lock_structs | trx_lock_memory_bytes | trx_rows_locked | trx_rows_modified | trx_concurrency_tickets | trx_isolation_level | trx_unique_checks | trx_foreign_key_checks | trx_last_foreign_key_error | trx_adaptive_hash_latched | trx_adaptive_hash_timeout | trx_is_read_only | trx_autocommit_non_locking | trx_schedule_weight | +--------+-----------+---------------------+------------------------------+---------------------+------------+---------------------+----------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+---------------------+ | 45579 | LOCK WAIT | 2022-04-23 15:38:19 | 5652286592:2:4:20:4471239192 | 2022-04-23 15:38:19 | 2 | 10 | update test set phone='123' where id=1 | starting index read | 1 | 1 | 2 | 1128 | 1 | 0 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 0 | 0 | 0 | 1 | | 45577 | RUNNING | 2022-04-23 15:35:08 | NULL | NULL | 3 | 9 | NULL | NULL | 0 | 1 | 2 | 1128 | 1 | 1 | 0 | REPEATABLE READ | 1 | 1 | NULL | 0 | 0 | 0 | 0 | NULL | +--------+-----------+---------------------+------------------------------+---------------------+------------+---------------------+----------------------------------------+---------------------+-------------------+-------------------+------------------+-----------------------+-----------------+-------------------+-------------------------+---------------------+-------------------+------------------------+----------------------------+---------------------------+---------------------------+------------------+----------------------------+---------------------+ 2 rows in set (0.00 sec) 可以看到trx_id為45579、trx_mysql_thread_id為10的事務處於鎖等待中,同時也展示了執行的語句。我們來看看鎖的持有和請求情況:

performance_schema.data_locks: mysql> select * from performance_schema.data_locks; +--------+------------------------------+-----------------------+-----------+----------+---------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+ | ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA | +--------+------------------------------+-----------------------+-----------+----------+---------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+ | INNODB | 5652286592:1063:4688481592 | 45579 | 51 | 16 | my_test | test | NULL | NULL | NULL | 4688481592 | TABLE | IX | GRANTED | NULL | | INNODB | 5652286592:2:4:20:4471239192 | 45579 | 51 | 16 | my_test | test | NULL | NULL | PRIMARY | 4471239192 | RECORD | X,REC_NOT_GAP | WAITING | 1 | | INNODB | 5652287384:1063:4688483544 | 45577 | 50 | 20 | my_test | test | NULL | NULL | NULL | 4688483544 | TABLE | IX | GRANTED | NULL | | INNODB | 5652287384:2:4:20:4471243800 | 45577 | 50 | 20 | my_test | test | NULL | NULL | PRIMARY | 4471243800 | RECORD | X,REC_NOT_GAP | GRANTED | 1 | +--------+------------------------------+-----------------------+-----------+----------+---------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+ 4 rows in set (0.00 sec) 可以看到trx_id為45579正在等待id為1的行鎖,行鎖被trx_id為45577的事務持有。

解決

筆者一般遇到鎖超時的時候是介面響應時間明顯超常,一般遇到這樣的問題是因為另外一個事務出現了IO阻塞或者等待或者處理其他邏輯耗時導致事務一直沒有被提交。遇到這種異常情況可以先把佔用鎖超時的執行緒給kill掉,避免下次還會影響其他正常的事務。後續需要去分析佔用超常時間的事務程式碼,是什麼原因導致的,是一個大事務?事務中間有慢查詢?事務中間進行了其他請求,對應的資源出現了請求超時?等等。

kill的話就找到長期佔用鎖的事務對應的執行緒id,如上面的事務trx_id為45577,執行緒id為9。當然這個只是臨時解決,可以對這個事務對應的功能進行降級,處理完後在重新上線。

死鎖

先後執行 事務1: ``` mysql> begin;update test set phone='123' where id=1; Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 **事務2:** mysql> begin;update user set phone ='123' where id=1; Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 **事務1:** mysql> update user set phone ='123' where id=1; Query OK, 1 row affected (5.67 sec) Rows matched: 1 Changed: 1 Warnings: 0 **事務2:** mysql> update test set phone='123' where id=1; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction ``` 馬上就監測出來死鎖並回滾了。

解決

上面的例子其實就可以對映於實際開發中,需要儘量保證sql執行順序的一致,首先就是方法的抽出,當然一些情況無法容易注意,最簡單且不容易出錯的方式就是加分散式鎖。