從framework角度看app保活問題

語言: CN / TW / HK

問題背景

最近在羣裏看到羣友在討論app保活的問題,回想之前做應用(運動類)開發時也遇到過類似的需求,於是便又來了興趣,果斷加入其中,和羣友展開了激烈的討論

保活

不少羣友的想法和我當初的想法一樣,這特麼保活不是看系統的心情麼,系統想讓誰活誰才能活,作為app開發者,根本無能為力,可真的是這樣的嗎?

保活方案

首先,我整理了從古到今,app開發者所使用過的以及當前還在使用的保活方式,主要思路有兩個:保活和復活

保活的方案有:
  • 1像素慘案

  • 後台無聲音樂

  • 前台service

  • 心跳機制
  • socket長連接
  • 無障礙服務
  • ......
復活的方案有:
  • 雙進程守護(java層和native層)
  • JobScheduler定時任務
  • 推送/相互喚醒
  • ......

不難看出,app開發者為了能讓自己的應用多存活一會兒,可謂是絞盡腦汁,但即使這樣,隨着Android系統升級,尤其是進入8.0之後,系統對應用的限制越來越高,傳統的保活方式已經不生效,這讓Android開發者手足無措,於是乎,出現了一種比較和諧的保活方式:

  • 引導用户開啟手機白名單

這也是目前絕大多數應用所採用的的方式,相對於傳統黑科技而言,此方式顯得不那麼流氓,比較容易被用户所接受。

但跟微信這樣的國民級應用比起來,保活效果還是差了一大截,那麼微信是怎麼實現保活的呢?或者回到我們開頭的問題,應用的生死真的只能靠系統調度嗎?開發者能否干預控制呢?

進程調度原則

解開這個疑問之前,我們需要了解一下Android系統進程調度原則,主要介紹framework中承載四大組件的進程是如何根據組件狀態而動態調節自身狀態的。進程有兩個比較重要的狀態值:

  • oom_adj,定義在frameworks/base/services/core/java/com/android/server/am/ProcessList.java當中

  • procState,定義在frameworks/base/core/java/android/app/ActivityManager.java當中

OOM_ADJ

以Android10的源碼為例,oom_adj劃分為20級,取值範圍[-10000,1001],Android6.0以前的取值範圍是[-17,16]

  • oom_adj值越大,優先級越低

  • oom_adj<0的進程都是系統進程。

```java public final class ProcessList { static final String TAG = TAG_WITH_CLASS_NAME ? "ProcessList" : TAG_AM;

// The minimum time we allow between crashes, for us to consider this
// application to be bad and stop and its services and reject broadcasts.
static final int MIN_CRASH_INTERVAL = 60 * 1000;

// OOM adjustments for processes in various states:

// Uninitialized value for any major or minor adj fields
static final int INVALID_ADJ = -10000;

// Adjustment used in certain places where we don't know it yet.
// (Generally this is something that is going to be cached, but we
// don't know the exact value in the cached range to assign yet.)
static final int UNKNOWN_ADJ = 1001;

// This is a process only hosting activities that are not visible,
// so it can be killed without any disruption.
static final int CACHED_APP_MAX_ADJ = 999;
static final int CACHED_APP_MIN_ADJ = 900;

// This is the oom_adj level that we allow to die first. This cannot be equal to
// CACHED_APP_MAX_ADJ unless processes are actively being assigned an oom_score_adj of
// CACHED_APP_MAX_ADJ.
static final int CACHED_APP_LMK_FIRST_ADJ = 950;

// Number of levels we have available for different service connection group importance
// levels.
static final int CACHED_APP_IMPORTANCE_LEVELS = 5;

// The B list of SERVICE_ADJ -- these are the old and decrepit
// services that aren't as shiny and interesting as the ones in the A list.
static final int SERVICE_B_ADJ = 800;

// This is the process of the previous application that the user was in.
// This process is kept above other things, because it is very common to
// switch back to the previous app.  This is important both for recent
// task switch (toggling between the two top recent apps) as well as normal
// UI flow such as clicking on a URI in the e-mail app to view in the browser,
// and then pressing back to return to e-mail.
static final int PREVIOUS_APP_ADJ = 700;

// This is a process holding the home application -- we want to try
// avoiding killing it, even if it would normally be in the background,
// because the user interacts with it so much.
static final int HOME_APP_ADJ = 600;

// This is a process holding an application service -- killing it will not
// have much of an impact as far as the user is concerned.
static final int SERVICE_ADJ = 500;

// This is a process with a heavy-weight application.  It is in the
// background, but we want to try to avoid killing it.  Value set in
// system/rootdir/init.rc on startup.
static final int HEAVY_WEIGHT_APP_ADJ = 400;

// This is a process currently hosting a backup operation.  Killing it
// is not entirely fatal but is generally a bad idea.
static final int BACKUP_APP_ADJ = 300;

// This is a process bound by the system (or other app) that's more important than services but
// not so perceptible that it affects the user immediately if killed.
static final int PERCEPTIBLE_LOW_APP_ADJ = 250;

// This is a process only hosting components that are perceptible to the
// user, and we really want to avoid killing them, but they are not
// immediately visible. An example is background music playback.
static final int PERCEPTIBLE_APP_ADJ = 200;

// This is a process only hosting activities that are visible to the
// user, so we'd prefer they don't disappear.
static final int VISIBLE_APP_ADJ = 100;
static final int VISIBLE_APP_LAYER_MAX = PERCEPTIBLE_APP_ADJ - VISIBLE_APP_ADJ - 1;

// This is a process that was recently TOP and moved to FGS. Continue to treat it almost
// like a foreground app for a while.
// @see TOP_TO_FGS_GRACE_PERIOD
static final int PERCEPTIBLE_RECENT_FOREGROUND_APP_ADJ = 50;

// This is the process running the current foreground app.  We'd really
// rather not kill it!
static final int FOREGROUND_APP_ADJ = 0;

// This is a process that the system or a persistent process has bound to,
// and indicated it is important.
static final int PERSISTENT_SERVICE_ADJ = -700;

// This is a system persistent process, such as telephony.  Definitely
// don't want to kill it, but doing so is not completely fatal.
static final int PERSISTENT_PROC_ADJ = -800;

// The system process runs at the default adjustment.
static final int SYSTEM_ADJ = -900;

// Special code for native processes that are not being managed by the system (so
// don't have an oom adj assigned by the system).
static final int NATIVE_ADJ = -1000;

// Memory pages are 4K.
static final int PAGE_SIZE = 4 * 1024;

//省略部分代碼

} ```

| ADJ級別 | 取值 | 説明(可參考源碼註釋) | | :------------------------------------ | ------ | -------------------------------------------------- | | INVALID_ADJ | -10000 | 未初始化adj字段時的默認值 | | UNKNOWN_ADJ | 1001 | 緩存進程,無法獲取具體值 | | CACHED_APP_MAX_ADJ | 999 | 不可見activity進程的最大值 | | CACHED_APP_MIN_ADJ | 900 | 不可見activity進程的最小值 | | CACHED_APP_LMK_FIRST_ADJ | 950 | lowmemorykiller優先殺死的級別值 | | SERVICE_B_ADJ | 800 | 舊的service的 | | PREVIOUS_APP_ADJ | 700 | 上一個應用,常見於應用切換場景 | | HOME_APP_ADJ | 600 | home進程 | | SERVICE_ADJ | 500 | 創建了service的進程 | | HEAVY_WEIGHT_APP_ADJ | 400 | 後台的重量級進程,system/rootdir/init.rc文件中設置 | | BACKUP_APP_ADJ | 300 | 備份進程 | | PERCEPTIBLE_LOW_APP_ADJ | 250 | 受其他進程約束的進程 | | PERCEPTIBLE_APP_ADJ | 200 | 可感知組件的進程,比如背景音樂播放 | | VISIBLE_APP_ADJ | 100 | 可見進程 | | PERCEPTIBLE_RECENT_FOREGROUND_APP_ADJ | 50 | 最近運行的後台進程 | | FOREGROUND_APP_ADJ | 0 | 前台進程,正在與用户交互 | | PERSISTENT_SERVICE_ADJ | -700 | 系統持久化進程已綁定的進程 | | PERSISTENT_PROC_ADJ | -800 | 系統持久化進程,比如telephony | | SYSTEM_ADJ | -900 | 系統進程 | | NATIVE_ADJ | -1000 | native進程,不受系統管理 |

可以通過cat /proc/進程id/oom_score_adj查看目標進程的oom_adj值,例如我們查看電話的adj

dialer_oom_adj

值為935,處於不可見進程的範圍內,當我啟動電話app,再次查看

dialer_oom_adj_open

此時adj值為0,也就是正在與用户交互的進程

ProcessState

process_state劃分為23類,取值範圍為[-1,21]

```java @SystemService(Context.ACTIVITY_SERVICE) public class ActivityManager { //省略部分代碼 /* @hide Not a real process state. / public static final int PROCESS_STATE_UNKNOWN = -1;

/** @hide Process is a persistent system process. */
public static final int PROCESS_STATE_PERSISTENT = 0;

/** @hide Process is a persistent system process and is doing UI. */
public static final int PROCESS_STATE_PERSISTENT_UI = 1;

/** @hide Process is hosting the current top activities.  Note that this covers
 * all activities that are visible to the user. */
@UnsupportedAppUsage
public static final int PROCESS_STATE_TOP = 2;

/** @hide Process is hosting a foreground service with location type. */
public static final int PROCESS_STATE_FOREGROUND_SERVICE_LOCATION = 3;

/** @hide Process is bound to a TOP app. This is ranked below SERVICE_LOCATION so that
 * it doesn't get the capability of location access while-in-use. */
public static final int PROCESS_STATE_BOUND_TOP = 4;

/** @hide Process is hosting a foreground service. */
@UnsupportedAppUsage
public static final int PROCESS_STATE_FOREGROUND_SERVICE = 5;

/** @hide Process is hosting a foreground service due to a system binding. */
@UnsupportedAppUsage
public static final int PROCESS_STATE_BOUND_FOREGROUND_SERVICE = 6;

/** @hide Process is important to the user, and something they are aware of. */
public static final int PROCESS_STATE_IMPORTANT_FOREGROUND = 7;

/** @hide Process is important to the user, but not something they are aware of. */
@UnsupportedAppUsage
public static final int PROCESS_STATE_IMPORTANT_BACKGROUND = 8;

/** @hide Process is in the background transient so we will try to keep running. */
public static final int PROCESS_STATE_TRANSIENT_BACKGROUND = 9;

/** @hide Process is in the background running a backup/restore operation. */
public static final int PROCESS_STATE_BACKUP = 10;

/** @hide Process is in the background running a service.  Unlike oom_adj, this level
 * is used for both the normal running in background state and the executing
 * operations state. */
@UnsupportedAppUsage
public static final int PROCESS_STATE_SERVICE = 11;

/** @hide Process is in the background running a receiver.   Note that from the
 * perspective of oom_adj, receivers run at a higher foreground level, but for our
 * prioritization here that is not necessary and putting them below services means
 * many fewer changes in some process states as they receive broadcasts. */
@UnsupportedAppUsage
public static final int PROCESS_STATE_RECEIVER = 12;

/** @hide Same as {@link #PROCESS_STATE_TOP} but while device is sleeping. */
public static final int PROCESS_STATE_TOP_SLEEPING = 13;

/** @hide Process is in the background, but it can't restore its state so we want
 * to try to avoid killing it. */
public static final int PROCESS_STATE_HEAVY_WEIGHT = 14;

/** @hide Process is in the background but hosts the home activity. */
@UnsupportedAppUsage
public static final int PROCESS_STATE_HOME = 15;

/** @hide Process is in the background but hosts the last shown activity. */
public static final int PROCESS_STATE_LAST_ACTIVITY = 16;

/** @hide Process is being cached for later use and contains activities. */
@UnsupportedAppUsage
public static final int PROCESS_STATE_CACHED_ACTIVITY = 17;

/** @hide Process is being cached for later use and is a client of another cached
 * process that contains activities. */
public static final int PROCESS_STATE_CACHED_ACTIVITY_CLIENT = 18;

/** @hide Process is being cached for later use and has an activity that corresponds
 * to an existing recent task. */
public static final int PROCESS_STATE_CACHED_RECENT = 19;

/** @hide Process is being cached for later use and is empty. */
public static final int PROCESS_STATE_CACHED_EMPTY = 20;

/** @hide Process does not exist. */
public static final int PROCESS_STATE_NONEXISTENT = 21;
//省略部分代碼

} ```

| state級別 | 取值 | 説明(可參考源碼註釋) | | ----------------------------------------- | ---- | -------------------------------------------------- | | PROCESS_STATE_UNKNOWN | -1 | 不是真正的進程狀態 | | PROCESS_STATE_PERSISTENT | 0 | 持久化的系統進程 | | PROCESS_STATE_PERSISTENT_UI | 1 | 持久化的系統進程,並且正在操作UI | | PROCESS_STATE_TOP | 2 | 處於棧頂Activity的進程 | | PROCESS_STATE_FOREGROUND_SERVICE_LOCATION | 3 | 運行前台位置服務的進程 | | PROCESS_STATE_BOUND_TOP | 4 | 綁定到top應用的進程 | | PROCESS_STATE_FOREGROUND_SERVICE | 5 | 運行前台服務的進程 | | PROCESS_STATE_BOUND_FOREGROUND_SERVICE | 6 | 綁定前台服務的進程 | | PROCESS_STATE_IMPORTANT_FOREGROUND | 7 | 對用户很重要的前台進程 | | PROCESS_STATE_IMPORTANT_BACKGROUND | 8 | 對用户很重要的後台進程 | | PROCESS_STATE_TRANSIENT_BACKGROUND | 9 | 臨時處於後台運行的進程 | | PROCESS_STATE_BACKUP | 10 | 備份進程 | | PROCESS_STATE_SERVICE | 11 | 運行後台服務的進程 | | PROCESS_STATE_RECEIVER | 12 | 運動廣播的後台進程 | | PROCESS_STATE_TOP_SLEEPING | 13 | 處於休眠狀態的進程 | | PROCESS_STATE_HEAVY_WEIGHT | 14 | 後台進程,但不能恢復自身狀態 | | PROCESS_STATE_HOME | 15 | 後台進程,在運行home activity | | PROCESS_STATE_LAST_ACTIVITY | 16 | 後台進程,在運行最後一次顯示的activity | | PROCESS_STATE_CACHED_ACTIVITY | 17 | 緩存進程,包含activity | | PROCESS_STATE_CACHED_ACTIVITY_CLIENT | 18 | 緩存進程,且該進程是另一個包含activity進程的客户端 | | PROCESS_STATE_CACHED_RECENT | 19 | 緩存進程,且有一個activity是最近任務裏的activity | | PROCESS_STATE_CACHED_EMPTY | 20 | 空的緩存進程,備用 | | PROCESS_STATE_NONEXISTENT | 21 | 不存在的進程 |

進程調度算法

frameworks/base/services/core/java/com/android/server/am/OomAdjuster.java中,有三個核心方法用於計算和更新進程的oom_adj值

  • updateOomAdjLocked():更新adj,當目標進程為空,或者被殺則返回false,否則返回true。
  • computeOomAdjLocked():計算adj,計算成功返回true,否則返回false。
  • applyOomAdjLocked():應用adj,當需要殺掉目標進程則返回false,否則返回true。
adj更新時機

也就是updateOomAdjLocked()被調用的時機。通俗的説,只要四大組件被創建或者狀態發生變化,或者當前進程綁定了其他進程,都會觸發adj更新,具體可在源碼中查看此方法被調用的地方,比較多,這裏就不列舉了

adj的計算過程

computeOomAdjLocked()計算過程相當複雜,將近1000行代碼,這裏就不貼了,有興趣可自行查看,總體思路就是根據當前進程的狀態,設置對應的adj值,因為狀態值很多,所以會有很多個if來判斷每個狀態是否符合,最終計算出當前進程屬於哪種狀態。

adj的應用

計算得出的adj值將發送給lowmemorykiller(簡稱lmk),由lmk來決定進程的生死,不同的廠商,lmk的算法略有不同,下面是源碼中對lmk的介紹

/* drivers/misc/lowmemorykiller.c * * The lowmemorykiller driver lets user-space specify a set of memory thresholds * where processes with a range of oom_score_adj values will get killed. Specify * the minimum oom_score_adj values in * /sys/module/lowmemorykiller/parameters/adj and the number of free pages in * /sys/module/lowmemorykiller/parameters/minfree. Both files take a comma * separated list of numbers in ascending order. * * For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and * "1024,4096" to /sys/module/lowmemorykiller/parameters/minfree to kill * processes with a oom_score_adj value of 8 or higher when the free memory * drops below 4096 pages and kill processes with a oom_score_adj value of 0 or * higher when the free memory drops below 1024 pages. * * The driver considers memory used for caches to be free, but if a large * percentage of the cached memory is locked this can be very inaccurate * and processes may not get killed until the normal oom killer is triggered. * * Copyright (C) 2007-2008 Google, Inc. * * This software is licensed under the terms of the GNU General Public * License version 2, as published by the Free Software Foundation, and * may be copied, distributed, and modified under those terms. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * */

保活核心思路

根據上面的Android進程調度原則得知,我們需要儘可能降低app進程的adj值,從而減少被lmk殺掉的可能性,而我們傳統的保活方式最終目的也是降低adj值。而根據adj等級分類可以看出,通過應用層的方式最多能將adj降到100~200之間,我分別測試了微信、支付寶、酷狗音樂,啟動後返回桌面並息屏,測試結果如下

微信測試結果:

weixin_oom_adj

微信創建了兩個進程,查看這兩個進程的adj值均為100,對應為adj等級表中的VISIBLE_APP_ADJ,此結果為測試機上微信未登錄狀態測試結果,當換成我的小米8測試後發現,登錄狀態下的微信有三個進程在運行

weixin_login_oom_adj

後查閲資料得知,進程名為com.tencent.soter.soterserver的進程是微信指紋支付,此進程的adj值居然為-800,上面我們説過,adj小於0的進程為系統進程,那麼微信是如何做到創建一個系統進程的,我和我的小夥伴都驚呆了~o.o~,為此,我對比了一下支付寶的測試結果

支付寶測試結果:

alipay_oom_adj

支付寶創建了六個進程,查看這六個進程的adj值,除了一個為915,其餘均為0,怎麼肥事,0就意味着正在與用户交互的前台進程啊,我的世界要崩塌了,只有一種可能,支付寶通過未知的黑科技降低了adj值。

酷狗測試結果:

kugou_oom_adj.png

酷狗創建了兩個進程,查看這兩個進程的adj值分別為700、200,對應為adj等級表中的PREVIOUS_APP_ADJPERCEPTIBLE_APP_ADJ,還好,這個在意料之中。

測試思考

通過上面三個app的測試結果可以看出,微信和支付寶一定是使用了某種保活手段,讓自身的adj降到最低,尤其是微信,居然可以創建系統進程,簡直太逆天了,這是應用層絕對做不到的,一定是在native層完成的,但具體什麼黑科技就不得而知了,畢竟反編譯技術不是我的強項。

正當我鬱鬱寡歡之時,我想起了前兩天看過的一篇文章《當 App 有了系統權限,真的可以為所欲為?》,文章講述了第三方App如何利用CVE漏洞獲取到系統權限,然後神不知鬼不覺的幹一些匪夷所思的事兒,這讓我茅塞頓開,或許這些大廠的app就是利用了系統漏洞來保活的,不然真的就説不通了,既然都能獲取到系統權限了,那創建個系統進程不是分分鐘的事兒嗎,還需要啥廠商白名單。

總結

進程保活是一把雙刃劍,增加app存活時間的同時犧牲的是用户手機的電量,內存,cpu等資源,甚至還有用户的忍耐度,作為開發者一定要合理取捨,不要為了保活而保活,即使需要保活,也儘量採用白色保活手段,別讓用户手機變板磚,然後再來哭爹罵娘。

參考資料:

探討Android6.0及以上系統APP常駐內存(保活)實現-爭寵篇

探討Android6.0及以上系統APP常駐內存(保活)實現-復活篇

探討一種新型的雙進程守護應用保活

史上最強Android保活思路:深入剖析騰訊TIM的進程永生技術

當 App 有了系統權限,真的可以為所欲為?

「 深藍洞察 」2022 年度最“不可赦”漏洞