如何分析&解決Android ANR
開啟掘金成長之旅!這是我參與「掘金日新計劃 · 12 月更文挑戰」的第16天
1.什麼是 ANR
ANR:Application Not Responding ,即應用無響應
2.ANR 的型別以及對比每種型別的規避解決方法
ANR 一般有三種類型: 1)KeyDispatchTimeout(5 seconds) 按鍵或觸控事件在特定時間內無響應 2)BroadcastTimeout(10 seconds) BroadcastReceiver 在特定時間內無法處理完成 3)ServiceTimeout(20 seconds or 200 seconds) 前臺服務:超時時間是 20s; 後臺服務,則超時時間是200s; Service 在特定的時間內無法處理完成
2.1 KeyDispatchTimeout
Akey or touch event was not dispatched within the specified time (按鍵或觸控事件在特定時間內無響應)
具體的超時時間的定義在 framework 下的
java
ActivityManagerService.java
//How long we wait until we timeout on key dispatching.
staticfinal int KEY_DISPATCHING_TIMEOUT = 5*1000
為什麼會超時呢?
超時時間的計數一般是從按鍵分發給 app 開始。超時的原因一般有兩種 :
1)當前的事件沒有機會得到處理(即 UI 執行緒正在處理前一個事件,沒有及時的完成或者 looper 被某種原因阻塞住了)
2)當前的事件正在處理,但沒有及時完成
如何避免 KeyDispatchTimeout? 1)UI 執行緒儘量只做跟 UI 相關的工作 2)耗時的工作(比如資料庫操作, I/O ,連線網路或者別的有可能阻礙 UI 執行緒的操作)把它放入單獨的執行緒處理 3)儘量用 Handler 來處理 UIthread 和別的 thread 之間的互動
至於Service和BroadcastReceiver ,和上面的分析同理,這裡就不多說了。
3.如何分析ANR日誌?
先看個 LOG:
java
04-01 13:12:11.572 I/InputDispatcher( 220): Application is not responding :Window{2b263310com.android.email/com.android.email.activity.SplitScreenActivitypaused=false}. 5009.8ms since event, 5009.5ms since waitstarted
04-0113:12:11.572 I/WindowManager( 220): Input event dispatching timedout sending tocom.android.email/com.android.email.activity.SplitScreenActivity
04-01 13:12:14.123 I/Process( 220): Sending signal. PID: 21404 SIG: 3--- 發生ANR 的時間和生成 trace.txt 的時間
04-01 13:12:14.123 I/dalvikvm(21404):threadid=4: reacting to signal 3
……
04-0113:12:15.872 E/ActivityManager( 220): ANR in com.android.email(com.android.email/.activity.SplitScreenActivity)
04-0113:12:15.872 E/ActivityManager( 220): Reason:keyDispatchingTimedOut
04-0113:12:15.872 E/ActivityManager( 220): Load: 8.68 / 8.37 / 8.53
04-0113:12:15.872 E/ActivityManager( 220): CPUusage from 4361ms to 699ms ago ---- CPU 在 ANR 發生前的使用情況
04-0113:12:15.872 E/ActivityManager( 220): 5.5%21404/com.android.email: 1.3% user + 4.1% kernel / faults: 10 minor
04-0113:12:15.872 E/ActivityManager( 220): 4.3%220/system_server: 2.7% user + 1.5% kernel / faults: 11 minor 2 major
04-0113:12:15.872 E/ActivityManager( 220): 0.9%52/spi_qsd.0: 0% user + 0.9% kernel
04-0113:12:15.872 E/ActivityManager( 220): 0.5%65/irq/170-cyttsp-: 0% user + 0.5% kernel
04-0113:12:15.872 E/ActivityManager( 220): 0.5%296/com.android.systemui: 0.5% user + 0% kernel
04-0113:12:15.872 E/ActivityManager( 220): 100%TOTAL: 4.8% user + 7.6% kernel + 87% iowait
04-0113:12:15.872 E/ActivityManager( 220): CPUusage from 3697ms to 4223ms later :-- ANR 後 CPU 的使用量
04-0113:12:15.872 E/ActivityManager( 220): 25%21404/com.android.email: 25% user + 0% kernel / faults: 191 minor
04-0113:12:15.872 E/ActivityManager( 220): 16% 21603/__eas(par.hakan: 16% user + 0% kernel
04-0113:12:15.872 E/ActivityManager( 220): 7.2% 21406/GC: 7.2% user + 0% kernel
04-0113:12:15.872 E/ActivityManager( 220): 1.8% 21409/Compiler: 1.8% user + 0% kernel
04-0113:12:15.872 E/ActivityManager( 220): 5.5%220/system_server: 0% user + 5.5% kernel / faults: 1 minor
04-0113:12:15.872 E/ActivityManager( 220): 5.5% 263/InputDispatcher: 0% user + 5.5% kernel
04-0113:12:15.872 E/ActivityManager( 220): 32%TOTAL: 28% user + 3.7% kernel
從ANR日誌中,我們看到了ANR in com.android.email
關鍵字,然後接下來,我們看到Reason:keyDispatchingTimedOut
,是在事件響應裡面超時了,但是具體在哪裡,這裡看不出來,但是這裡可以看到一個資訊,即:ANR的原因是CPU佔用率高,任務得不到時間片去執行,還是因為IO密集,導致ANR,這個很重要,可以給我們之後分析具體的trace日誌提供方向。
除了看 LOG ,解決 ANR 還得需要 trace.txt 檔案,如何獲取呢?可以用如下命令獲取
java
$chmod 777 /data/anr
$rm /data/anr/traces.txt
$ps
$kill -3 PID
adb pull data/anr/traces.txt ./mytraces.txt
從 trace.txt 檔案,看到最多的是如下的資訊:
```java
-----pid 21404 at 2011-04-01 13:12:14 -----
Cmdline: com.android.email
DALVIK THREADS:
(mutexes: tll=0tsl=0 tscl=0 ghl=0 hwl=0 hwll=0)
"main" prio=5 tid=1 NATIVE
| group="main" sCount=1 dsCount=0obj=0x2aad2248 self=0xcf70
| sysTid=21404 nice=0 sched=0/0cgrp=[fopen-error:2] handle=1876218976
atandroid.os.MessageQueue.nativePollOnce(Native Method) atandroid.os.MessageQueue.next(MessageQueue.java:119) atandroid.os.Looper.loop(Looper.java:110 )
at android.app.ActivityThread.main(ActivityThread.java:3688)
at java.lang.reflect.Method.invokeNative(Native Method)
atjava.lang.reflect.Method.invoke(Method.java:507)
atcom.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:866)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:624)
at dalvik.system.NativeStart.main(Native Method)
``
我們從這裡看到,是主執行緒卡在了
nativePollOnce`這裡,從之前小編分析過handler、MQ原始碼可知,這個是訊息佇列為空,在等待下一條訊息入隊,進行主執行緒喚醒,也就是說主執行緒死在,一直等待,等待超時了。
小結: 如何調查並解決 ANR? 1)首先分析 log,看一下大概原因,如果能直接定位最好,但是一般這裡直接定位不到的 2)從 trace.txt 檔案檢視呼叫 stack. 3)對照每個執行緒日誌,看自己程式碼 4)過程中,緊緊穿插一條主線,仔細檢視 ANR 的成因( iowait?block?memoryleak? )
4.案例分析
4.1 IO wait舉例
java
Process:com.android.email
Activity:com.android.email/.activity.MessageView Subject:
keyDispatchingTimedOut CPU usage from 2550ms to -2814ms ago: 5%187
/system_server: 3.5% user + 1.4% kernel
/ faults: 86 minor 20major 4.4% 1134/com.android.email: 0.7% user + 3.7% kernel
/faults: 38 minor 19 major 4% 372/com.android.eventstream: 0.7%user + 3.3% kernel
/ faults: 6 minor 1.1% 272/com.android.phone:0.9% user + 0.1% kernel
/ faults: 33 minor 0.9%252/com.android.systemui: 0.9% user + 0% kernel 0%409/com.android.eventstream.telephonyplugin: 0% user + 0% kernel
/faults: 2 minor 0.1% 632/com.android.devicemonitor: 0.1% user + 0%kernel
100%TOTAL: 6.9% user + 8.2% kernel + 84%iowait
-----pid 1134 at 2010-12-17 17:46:51 -----
Cmd line:com.android.email
DALVIK THREADS: (mutexes: tll=0 tsl=0tscl=0 ghl=0 hwl=0 hwll=0)
"main" prio=5 tid=1 WAIT |group="main" sCount=1 dsCount=0 obj=0x2aaca180self=0xcf20 | sysTid=1134 nice=0 sched=0/0 cgrp=[fopen-error:2]handle=1876218976 at java.lang.Object.wait(Native Method) -waiting on <0x2aaca218> (a java.lang.VMThread)
atjava.lang.Thread.parkFor(Thread.java:1424)
atjava.lang.LangAccessImpl.parkFor(LangAccessImpl.java:48)
atsun.misc.Unsafe.park(Unsafe.java:337)
atjava.util.concurrent.locks.LockSupport.park(LockSupport.java:157)
atjava.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:808)
atjava.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:841)
atjava.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1171)
atjava.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:200)
atjava.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:261)
atandroid.database.sqlite.SQLiteDatabase.lock(SQLiteDatabase.java:378)
atandroid.database.sqlite.SQLiteCursor.<init>(SQLiteCursor.java:222)
atandroid.database.sqlite.SQLiteDirectCursorDriver.query(SQLiteDirectCursorDriver.java:53)
atandroid.database.sqlite.SQLiteDatabase.rawQueryWithFactory(SQLiteDatabase.java:1356)
atandroid.database.sqlite.SQLiteDatabase.queryWithFactory(SQLiteDatabase.java:1235)
atandroid.database.sqlite.SQLiteDatabase.query(SQLiteDatabase.java:1189)
atandroid.database.sqlite.SQLiteDatabase.query(SQLiteDatabase.java:1271)
atcom.android.email.provider.EmailProvider.query(EmailProvider.java:1098)
atandroid.content.ContentProvider$Transport.query(ContentProvider.java:187)
atandroid.content. ContentResolver.query (ContentResolver.java:268)
atcom.android.email.provider.EmailContent$Message.restoreMessageWithId(EmailContent.java:648)
atcom.android.email.Controller.setMessageRead(Controller.java:658)
atcom.android.email.activity.MessageView.onMarkAsRead(MessageView.java:700)
atcom.android.email.activity.MessageView.access$2500(MessageView.java:98)
atcom.android.email.activity.MessageView$LoadBodyTask .onPostExecute(MessageView.java:1290)
atcom.android.email.activity.MessageView$LoadBodyTask.onPostExecute(MessageView.java:1255) atandroid.os.AsyncTask.finish(AsyncTask.java:417)
atandroid.os.AsyncTask.access$300(AsyncTask.java:127)
at android.os.AsyncTask $InternalHandler.handleMessage (AsyncTask.java:429)
atandroid.os.Handler.dispatchMessage(Handler.java:99)
atandroid.os.Looper.loop(Looper.java:123)
atandroid.app.ActivityThread.main(ActivityThread.java:3652)
atjava.lang.reflect.Method.invokeNative(Native Method)
atjava.lang.reflect.Method.invoke(Method.java:507)
atcom.android.internal.os.ZygoteIn
我們從日誌上,看到關鍵字84%iowait
,而且對於cpu來說,這段日誌佔用很少,說明大概原因就是IO密集型導致,從下面日誌中看到,有View、有ContentProvider、有Sqlite、有鎖,我們先大膽猜測一下,是否是在主執行緒裡面,進行了資料庫操作導致的?
具體看日誌,我們程式碼中應該有在view內部呼叫ContentResolver的地方,快搜一下,我們程式碼中,是否有這樣的呼叫,果然一搜,的確有:
java
final Message message = Message . restoreMessageWithId (mProviderContext , messageId );
if ( message == null ) {
return ;
}
Account account = Account . restoreAccountWithId ( mProviderContext ,message . mAccountKey );
if ( account == null ) {
return ; //isMessagingController returns false for null, but let's make itclear.
}
if ( isMessagingController ( account )) {
new Thread () {
@Override
public void run () {
mLegacyController . processPendingActions ( message .mAccountKey );
}
}. start ();
}
發現問題了沒有呢?這裡在主執行緒中呼叫了Account . restoreAccountWithId ( mProviderContext ,message . mAccountKey );
去查詢資料,如果正常系統資源不緊張的時候,這樣呼叫的確不會出現太大問題,但是假設這個資料很大、或者系統當前IO繁忙,這兒程式碼不就執行很慢,導致主執行緒事件處理超時了嗎?
所以把程式碼,簡單改為如下,執行沒有問題了。
```java
new Thread() {
final Message message = Message . restoreMessageWithId (mProviderContext , messageId );
if ( message == null ) {
return ;
}
Account account = Account . restoreAccountWithId ( mProviderContext ,message . mAccountKey );
if ( account == null ) {
return ; //isMessagingController returns false for null, but let's make itclear.
}
if(isMessagingController(account)) {
mLegacyController.processPendingActions(message.mAccountKey);
}
}.start(); ```
4.2Memoryleak/Thread leak
```java 11-1621:41:42.560 I/ActivityManager( 1190): ANR in process:android.process.acore (last in android.process.acore) 11-1621:41:42.560 I/ActivityManager( 1190): Annotation:keyDispatchingTimedOut 11-16 21:41:42.560 I/ActivityManager(1190): CPU usage: 11-16 21:41:42.560 I/ActivityManager( 1190):Load: 11.5 / 11.1 / 11.09 11-16 21:41:42.560 I/ActivityManager(1190): CPU usage from 9046ms to 4018ms ago: 11-16 21:41:42.560 I/ActivityManager( 1190): d.process.acore:98% = 97% user + 0% kernel / faults: 1134 minor 11-16 21:41:42.560 I/ActivityManager( 1190): system_server: 0% = 0% user + 0% kernel /faults: 1 minor 11-16 21:41:42.560 I/ActivityManager( 1190): adbd:0% = 0% user + 0% kernel 11-16 21:41:42.560 I/ActivityManager(1190): logcat: 0% = 0% user + 0% kernel 11-16 21:41:42.560 I/ActivityManager( 1190): TOTAL:100% = 98% user + 1% kernel
Cmdline: android.process.acore DALVIK THREADS: "main"prio=5 tid=3 VMWAIT |group="main" sCount=1 dsCount=0 s=N obj=0x40026240self=0xbda8 | sysTid=1815 nice=0 sched=0/0 cgrp=unknownhandle=-1344001376 atdalvik.system.VMRuntime.trackExternalAllocation (NativeMethod ) atandroid.graphics.Bitmap.nativeCreate(Native Method) atandroid.graphics.Bitmap.createBitmap (Bitmap.java:468) atandroid.view.View.buildDrawingCache(View.java:6324) atandroid.view.View.getDrawingCache(View.java:6178) atandroid.view.ViewGroup.drawChild(ViewGroup.java:1541) atcom.android.internal.policy.impl.PhoneWindow$DecorView.draw(PhoneWindow.java:1830) atandroid.view.ViewRoot.draw(ViewRoot.java:1349) atandroid.view.ViewRoot.performTraversals(ViewRoot.java:1114) atandroid.view.ViewRoot.handleMessage(ViewRoot.java:1633) atandroid.os.Handler.dispatchMessage(Handler.java:99) atandroid.os.Looper.loop(Looper.java:123) atandroid.app.ActivityThread.main(ActivityThread.java:4370) atjava.lang.reflect.Method.invokeNative(Native Method) atjava.lang.reflect.Method.invoke(Method.java:521) atcom.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:868) atcom.android.internal.os.ZygoteInit.main(ZygoteInit.java:626) atdalvik.system.NativeStart.main(Native Method)
"Thread-408"prio=5 tid=329 WAIT |group="main" sCount=1 dsCount=0 s=N obj=0x46910d40self=0xcd0548 | sysTid=10602 nice=0 sched=0/0 cgrp=unknownhandle=15470792
at java.lang.Object.wait(Native Method) -waiting on <0x468cd420> (a java.lang.Object)
atjava.lang.Object.wait(Object.java:288)
atcom.android.dialer.CallLogContentHelper$UiUpdaterExecutor$1.run(CallLogContentHelper.java:289)
atjava.lang.Thread.run(Thread.java:1096)
``
我們看到日誌中,ANR發生在VMRuntime中,有關鍵字Bitmap、createBitmap 、nativeCreate、ViewRoot、ActivityThread#main,我們大膽猜測一下,是否是在主執行緒檢視繪製函式中,進行了大圖片的載入、申請了bitmap沒有釋放導致,仔細看日誌,發現
at dalvik.system. VMRuntime.trackExternalAllocation (NativeMethod ) `,bitmap在申請記憶體的時候,不夠了,這時block了。
解決很簡單,這時根據後面的執行緒、程序、堆疊詳細資訊,去反向猜測&查詢相關程式碼,是否存在可能記憶體洩露的地方。