人工智慧 | ShowMeAI資訊日報 #2022.06.27

語言: CN / TW / HK

持續創作,加速成長!這是我參與「掘金日新計劃 · 6 月更文挑戰」的第29天,點選檢視活動詳情

ShowMeAI日報系列全新升級!覆蓋AI人工智慧 工具&框架 | 專案&程式碼 | 博文&分享 | 資料&資源 | 研究&論文 等方向。點選檢視 歷史文章列表,在公眾號內訂閱話題 #ShowMeAI資訊日報,可接收每日最新推送。點選 專題合輯&電子月刊 快速瀏覽各專題全集。

1.工具&框架

工具庫:ClearML - 開源的機器學習工具包,自帶簡潔美觀的視覺化介面

tags: [機器學習,建模,視覺化,工具包]

該工具可用於簡化機器學習開發、運維流程,自動完成實驗跟蹤與結果記錄,並提供了靈活多變的資料管理方案。

GitHub: https://github.com/allegroai/clearml

工具庫:Movenet.Pytorch - 用PyTorch重寫的MoveNet人體關鍵點檢測

tags: [關鍵點檢測,pytorch,MoveNet]

'Movenet.Pytorch - A Pytorch implementation of MoveNet from Google. Include training code and pre-trained model.' by Mr.Fire

GitHub: https://github.com/fire717/movenet.pytorch

工具:Beekeeper Studio - 跨平臺 SQL 編輯器

tags: [SQL編輯器,工具]

Beekeeper Studio提供 SQL 語法高亮、自動補全、資料表內容篩選與過濾、連線 Web 資料庫、儲存歷史查詢記錄等功能。支援 SQLite、MySQL、MariaDB、Postgres 等主流資料庫,併兼容 Windows、macOS、Linux 等桌面作業系統。

GitHub: https://github.com/beekeeper-studio/beekeeper-studio

工具:Think(雲策文件) - 開源知識管理工具

tags: [知識管理,工具]

Think內建知識庫、思維導圖、文件模板、線上編輯器等多種工具。可通過獨立的知識庫空間,結構化地組織線上協作文件,實現知識的積累與沉澱,促進知識的複用與流通。

GitHub: https://github.com/fantasticit/think

工具:dashy - 高度可定製化、自託管的伺服器啟動頁構建工具

tags: [伺服器啟動頁,定製化,自託管]

dashy自帶視覺化編輯器、狀態檢測系統,並擁有各類豐富的元件及主題。可為不同應用快速搭建一個伺服器管理面板,並基於各種元件、圖示、主題,完成自定義配置,專案內建身份驗證、狀態監測、搜尋、備份、視覺化配置、多語言支援等功能。

GitHub: https://github.com/Lissy93/dashy

2.專案&程式碼

程式碼:各種注意力機制的PyTorch實現

tags: [注意力機制,pytorch]

’External-Attention-pytorch - Pytorch implementation of various Attention Mechanism' by xmu-xiaoma66

GitHub: https://github.com/xmu-xiaoma666/External-Attention-pytorch

3.博文&分享

分享:Machine Learning Interview - 機器學習面試題庫

tags: [機器學習,面試,題庫]

收錄了世界各大網際網路公司的機器學習面試題。包括概率與統計、大資料、AB 測試、機器學習與深度學習領域的速查表、面試準備、學習指南、專案用例、面試經驗等內容。

GitHub: https://github.com/khangich/machine-learning-interview

分享:《劍指 Offer》 Python, Java, C++ 解題程式碼,LeetBook《圖解演算法資料結構》配套程式碼

tags: [資料結構,演算法,劍指offer,LeetCode]

GitHub: https://github.com/krahets/LeetCode-Book

4.資料&資源

資料集:internet-dataset - 通過搜尋引擎獲取的各種資料集

tags: [網際網路資料,搜尋引擎,資料集]

資料集整體資料量將近 50G,其中包括域名、網頁、反向索引等資料。

GitHub: https://github.com/RimoChan/internet-dataset

5.研究&論文

公眾號回覆關鍵字 日報,免費獲取整理好的6月論文合輯。

論文:EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

論文標題:EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

論文時間:CVPR 2022

所屬領域:計算機視覺

對應任務:3D Object Detection,6D Pose Estimation using RGB,object-detection,三維物體檢測,使用RGB進行6維姿勢估計,物體檢測

論文地址https://arxiv.org/abs/2203.13254

程式碼實現https://github.com/tjiiv-cprg/epro-pnp

論文作者:Hansheng Chen, Pichao Wang, Fan Wang, Wei Tian, Lu Xiong, Hao Li

論文簡介:The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution./2D-3D座標和相應的權重被視為中間變數,通過最小化預測和目標姿勢分佈之間的KL散度來學習。

論文摘要:Locating 3D objects from a single RGB image via Perspective-n-Points (PnP) is a long-standing problem in computer vision. Driven by end-to-end deep learning, recent studies suggest interpreting PnP as a differentiable layer, so that 2D-3D point correspondences can be partly learned by backpropagating the gradient w.r.t. object pose. Yet, learning the entire set of unrestricted 2D-3D points from scratch fails to converge with existing approaches, since the deterministic pose is inherently non-differentiable. In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose on the SE(3) manifold, essentially bringing categorical Softmax to the continuous domain. The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution. The underlying principle unifies the existing approaches and resembles the attention mechanism. EPro-PnP significantly outperforms competitive baselines, closing the gap between PnP-based method and the task-specific leaders on the LineMOD 6DoF pose estimation and nuScenes 3D object detection benchmarks.

通過Perspective-n-Points(PnP)從單一的RGB影象中定位3D物體是計算機視覺中一個長期存在的問題。在端到端深度學習的驅動下,最近的研究建議將PnP解釋為一個可微分層,這樣就可以通過反向傳播物體姿勢的梯度來部分地學習2D-3D點對應關係。然而,從頭開始學習整個無限制的2D-3D點集在現有的方法中無法收斂,因為確定的姿勢本身就是不可微分的。在本文中,我們提出了Pro-PnP,一個用於一般端到端姿勢估計的概率PnP層,它輸出SE(3)流形上的姿勢分佈,本質上是將分類的Softmax帶到連續域。2D-3D座標和相應的權重被視為中間變數,通過最小化預測和目標姿勢分佈之間的KL散度來學習。其基本原理統一了現有的方法,類似於注意力機制。EPro-PnP明顯優於競爭基準,在LineMOD 6DoF姿勢估計和nuScenes三維物體檢測基準上縮小了基於PnP的方法與特定任務領導者之間的差距。

論文:EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

論文標題:EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications

論文時間:21 Jun 2022

所屬領域:計算機視覺

對應任務:Byronfex,Image Classification,Object Detection,Semantic Segmentation,影象分類,物體檢測,語義分割

論文地址https://arxiv.org/abs/2206.10589

程式碼實現https://github.com/mmaaz60/EdgeNeXt

論文作者:Muhammad Maaz, Abdelrahman Shaker, Hisham Cholakkal, Salman Khan, Syed Waqas Zamir, Rao Muhammad Anwer, Fahad Shahbaz Khan

論文簡介:Our EdgeNeXt model with 1. 3M parameters achieves 71. 2\% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2. 2\% with 28\% reduction in FLOPs./我們的EdgeNeXt模型用1.3M的引數在ImageNet-1K上實現了71. 2%的top-1準確性,比MobileViT的絕對收益高出2.2%,FLOPs減少28%。

論文摘要:In the pursuit of achieving ever-increasing accuracy, large and complex neural networks are usually developed. Such models demand high computational resources and therefore cannot be deployed on edge devices. It is of great interest to build resource-efficient general purpose networks due to their usefulness in several application areas. In this work, we strive to effectively combine the strengths of both CNN and Transformer models and propose a new efficient hybrid architecture EdgeNeXt. Specifically in EdgeNeXt, we introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups and utilizes depth-wise convolution along with self-attention across channel dimensions to implicitly increase the receptive field and encode multi-scale features. Our extensive experiments on classification, detection and segmentation tasks, reveal the merits of the proposed approach, outperforming state-of-the-art methods with comparatively lower compute requirements. Our EdgeNeXt model with 1.3M parameters achieves 71.2\% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2.2\% with 28\% reduction in FLOPs. Further, our EdgeNeXt model with 5.6M parameters achieves 79.4\% top-1 accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/mmaaz60/EdgeNeXt

為了追求不斷提高的準確性,通常會開發大型複雜的神經網路。這種模型需要高計算資源,因此不能部署在邊緣裝置上。由於其在多個應用領域的實用性,建立資源效率高的通用網路是非常有意義的。在這項工作中,我們努力有效地結合CNN和Transformer模型的優勢,並提出了一個新的高效混合架構EdgeNeXt。具體來說,在EdgeNeXt中,我們引入了分割深度轉置注意力(SDTA)編碼器,將輸入張量分割成多個通道組,並利用深度卷積以及跨通道維度的自我注意力來隱含地增加感受野和編碼多尺度特徵。我們對分類、檢測和分割任務進行了廣泛的實驗,揭示了所提出的方法的優點,以相對較低的計算要求超越了最先進的方法。我們的EdgeNeXt模型有130萬個引數,在ImageNet-1K上達到了71.2%的top-1準確率,比MobileViT的絕對收益高2.2%,FLOPs減少28%。此外,我們的EdgeNeXt模型有5.6M的引數,在ImageNet-1K上達到了79.4%的最高準確率。這些程式碼和模型可在 https://github.com/mmaaz60/EdgeNeXt 公開獲取。

論文:RegionCLIP: Region-based Language-Image Pretraining

論文標題:RegionCLIP: Region-based Language-Image Pretraining

論文時間:CVPR 2022

所屬領域:計算機視覺

對應任務:Image Classification,object-detection,Object Detection,Transfer Learning,影象分類,物體檢測,遷移學習

論文地址https://arxiv.org/abs/2112.09106

程式碼實現https://github.com/microsoft/regionclip

論文作者:Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao

論文簡介:However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans./然而,我們表明,直接應用這樣的模型來識別影象區域進行物體檢測,會因為領域變化而導致效能不佳。CLIP被訓練為將影象整體與文字描述相匹配,而沒有捕捉到影象區域和文字跨度之間的細粒度對齊。

論文摘要:Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning settings. However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans. To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. Our method leverages a CLIP model to match image regions with template captions and then pretrains our model to align these region-text pairs in the feature space. When transferring our pretrained model to the open-vocabulary object detection tasks, our method significantly outperforms the state of the art by 3.8 AP50 and 2.2 AP for novel categories on COCO and LVIS datasets, respectively. Moreoever, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets. Our code is available at https://github.com/microsoft/RegionCLIP

使用影象-文字對的對比性語言-影象預訓練(CLIP)在零樣本和遷移學習設定中的影象分類上取得了令人印象深刻的結果。然而,我們表明,由於領域的轉移,直接應用這種模型來識別影象區域進行物體檢測會導致效能不佳。CLIP被訓練成將影象作為一個整體與文字描述相匹配,而沒有捕捉到影象區域和文字跨度之間的細粒度對齊。為了緩解這個問題,我們提出了一種叫做RegionCLIP的新方法,它極大地擴充套件了CLIP來學習區域級的視覺表徵,從而使影象區域和文字概念之間的細粒度對齊。我們的方法利用CLIP模型將影象區域與模板說明相匹配,然後預訓練我們的模型在特徵空間中對齊這些區域-文字對。當把我們的預訓練模型遷移到開放詞彙的物體檢測任務中時,我們的方法在COCO和LVIS資料集的新類別上分別以3.8AP50和2.2AP的成績大大超過了現有技術水平。此外,學習到的區域表徵支援物體檢測的零樣本推斷,在COCO和LVIS資料集上都顯示出很好的效果。我們的程式碼可在 https://github.com/microsoft/RegionCLIP 獲取。

論文:Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world

論文標題:Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world

論文時間:20 Jun 2022

對應任務:Imitation Learning,模仿學習

論文地址https://arxiv.org/abs/2206.09889

程式碼實現https://github.com/facebookresearch/nocturne

論文作者:Eugene Vinitsky, Nathan Lichtlé, Xiaomeng Yang, Brandon Amos, Jakob Foerster

論文簡介:We introduce \textit{Nocturne}, a new 2D driving simulator for investigating multi-agent coordination under partial observability./我們介紹了Nocturne,一個新的二維駕駛模擬器,用於研究部分可觀察性下的多Agent協調。

論文摘要:We introduce \textit{Nocturne}, a new 2D driving simulator for investigating multi-agent coordination under partial observability. The focus of Nocturne is to enable research into inference and theory of mind in real-world multi-agent settings without the computational overhead of computer vision and feature extraction from images. Agents in this simulator only observe an obstructed view of the scene, mimicking human visual sensing constraints. Unlike existing benchmarks that are bottlenecked by rendering human-like observations directly using a camera input, Nocturne uses efficient intersection methods to compute a vectorized set of visible features in a C++ back-end, allowing the simulator to run at 2000+ steps-per-second. Using open-source trajectory and map data, we construct a simulator to load and replay arbitrary trajectories and scenes from real-world driving data. Using this environment, we benchmark reinforcement-learning and imitation-learning agents and demonstrate that the agents are quite far from human-level coordination ability and deviate significantly from the expert trajectories.

我們介紹了Nocturne,這是一個新的二維駕駛模擬器,用於研究部分可觀察性下的多Agent協調。Nocturne的重點是在沒有計算機視覺和影象特徵提取的計算開銷的情況下,在現實世界的多Agent環境中進行推理和思維理論的研究。該模擬器中的代理只觀察場景的遮擋檢視,模仿人類的視覺感應限制。與現有的基準不同,Nocturne使用高效的交叉方法來計算C++後端可見特徵的向量集,使模擬器能夠以每秒2000多步的速度執行。利用開源的軌跡和地圖資料,我們構建了一個模擬器來載入和重放真實世界駕駛資料中的任意軌跡和場景。利用這個環境,我們對強化學習和模仿學習的代理進行了基準測試,並證明這些代理與人類水平的協調能力相差甚遠,並明顯偏離專家的軌跡。

論文:Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds

論文標題:Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds

論文時間:20 Jun 2022

所屬領域:計算機視覺

對應任務:3D Object Detection,object-detection,Object Detection,Self-Supervised Learning,3D物體檢測,物體檢測,自監督學習

論文地址https://arxiv.org/abs/2206.09900

程式碼實現https://github.com/chaytonmin/voxel-mae

論文作者:Chen Min, Dawei Zhao, Liang Xiao, Yiming Nie, Bin Dai

論文簡介:As the point clouds in 3D object detection is large-scale, it is impossible to reconstruct the input point clouds./由於三維物體檢測中的點雲是大規模的,不可能對輸入的點雲進行重構。

論文摘要:Mask-based pre-training has achieved great success for self-supervised learning in image, video and language, without manually annotated supervision. However, as information redundant data, it has not yet been studied in the field of 3D object detection. As the point clouds in 3D object detection is large-scale, it is impossible to reconstruct the input point clouds. In this paper, we propose a mask voxel classification network for large-scale point clouds pre-training. Our key idea is to divide the point clouds into voxel representations and classify whether the voxel contains point clouds. This simple strategy makes the network to be voxel-aware of the object shape, thus improving the performance of 3D object detection. Extensive experiments show great effectiveness of our pre-trained model with 3D object detectors (SECOND, CenterPoint, and PV-RCNN) on three popular datasets (KITTI, Waymo, and nuScenes). Codes are publicly available at https://github.com/chaytonmin/voxel-mae.

基於Mask/掩碼的預訓練在影象、影片和語言的自監督學習中取得了巨大的成功,不需要人工標註監督。然而,作為資訊冗餘的資料,它在三維物體檢測領域還沒有被研究。由於三維物體檢測中的點雲是大規模的,不可能對輸入的點雲進行重構。在本文中,我們提出了一個用於大規模點雲預訓練的掩碼體素分類網路。我們的關鍵思想是將點雲劃分為體素表徵,並對體素是否包含點雲進行分類。這種簡單的策略使得網路對物體形狀具有體素感知能力,從而提高了三維物體檢測的效能。廣泛的實驗表明,我們的預訓練模型與三維物體檢測器(SECOND、CenterPoint和PV-RCNN)在三個流行的資料集(KITTI、Waymo和nuScenes)上的效果非常好。程式碼可在 https://github.com/chaytonmin/voxel-mae 上公開獲取。

論文:Global Context Vision Transformers

論文標題:Global Context Vision Transformers

論文時間:20 Jun 2022

所屬領域:計算機視覺

對應任務:Image Classification,Inductive Bias,Instance Segmentation,object-detection,Object Detection,Semantic Segmentation,影象分類,歸納偏置,例項分割,物體檢測,語義分割

論文地址https://arxiv.org/abs/2206.09959

程式碼實現https://github.com/nvlabs/gcvit

論文作者:Ali Hatamizadeh, Hongxu Yin, Jan Kautz, Pavlo Molchanov

論文簡介:We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization./我們提出了全域性上下文視覺transformer(GC ViT),這是一種能提高參數和計算利用率的新型架構。

論文摘要:We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization. Our method leverages global context self-attention modules, joint with local self-attention, to effectively yet efficiently model both long and short-range spatial interactions, without the need for expensive operations such as computing attention masks or shifting local windows. In addition, we address the issue of lack of the inductive bias in ViTs via proposing to use a modified fused inverted residual blocks in our architecture. Our proposed GC ViT achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks. On ImageNet-1K dataset for classification, the base, small and tiny variants of GC ViT with 28M, 51M and 90M parameters achieve 83.2\%, 83.9\% and 84.4\% Top-1 accuracy, respectively, surpassing comparably-sized prior art such as CNN-based ConvNeXt and ViT-based Swin Transformer by a large margin. Pre-trained GC ViT backbones in downstream tasks of object detection, instance segmentation, and semantic segmentation using MS COCO and ADE20K datasets outperform prior work consistently, sometimes by large margins. Code available at https://github.com/nvlabs/gcvit

我們提出了全域性上下文視覺transformer(GC ViT),這是一種新型的架構,可以提高參數和計算的利用率。我們的方法利用全域性上下文自我注意模組,與區域性自我注意相結合,有效地對長距離和短距離的空間互動進行建模,而不需要進行昂貴(計算)的操作,如計算注意掩碼或滑動區域性視窗。此外,我們通過建議在我們的架構中使用改進的融合倒置殘差塊來解決ViTs中缺乏歸納偏置的問題。我們提出的GC ViT在影象分類、物體檢測和語義分割任務中取得了最先進的成果。在ImageNet-1K資料集的分類中,具有28M、51M和90M引數的GC ViT的基礎、小型和微型變體分別達到83.2\%、83.9\%和84.4\%的Top-1準確率,大大超過了基於CNN的ConvNeXt和基於ViT的Swin Transformer等規模相當的現有技術。在使用MS COCO和ADE20K資料集進行物件檢測、例項分割和語義分割的下游任務中,預先訓練的GC ViT骨幹一直優於先前的工作,甚至超出很多。程式碼可在 https://github.com/nvlabs/gcvit 獲取。

論文:EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

論文標題:EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

論文時間:21 Jun 2022

所屬領域:強化學習

對應任務:reinforcement-learning,強化學習

論文地址https://arxiv.org/abs/2206.10558

程式碼實現https://github.com/sail-sg/envpool , https://github.com/vwxyzjn/envpool-cleanrl , https://github.com/vwxyzjn/cleanrl , https://github.com/Denys88/rl_games

論文作者:Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu, Yufan Song, Ting Luo, Yukun Jiang, Zhongwen Xu, Shuicheng Yan

論文簡介:On a high-end machine, EnvPool achieves 1 million frames per second for the environment execution on Atari environments and 3 million frames per second on MuJoCo environments./在一臺高階機器上,EnvPool在Atari環境下的執行速度達到每秒100萬幀,在MuJoCo環境下達到每秒300萬幀。

論文摘要:There has been significant progress in developing reinforcement learning (RL) training systems. Past works such as IMPALA, Apex, Seed RL, Sample Factory, and others aim to improve the system's overall throughput. In this paper, we try to address a common bottleneck in the RL training system, i.e., parallel environment execution, which is often the slowest part of the whole system but receives little attention. With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups, ranging from a laptop, and a modest workstation, to a high-end machine like NVIDIA DGX-A100. On a high-end machine, EnvPool achieves 1 million frames per second for the environment execution on Atari environments and 3 million frames per second on MuJoCo environments. When running on a laptop, the speed of EnvPool is 2.8 times of the Python subprocess. Moreover, great compatibility with existing RL training libraries has been demonstrated in the open-sourced community, including CleanRL, rl_games, DeepMind Acme, etc. Finally, EnvPool allows researchers to iterate their ideas at a much faster pace and has the great potential to become the de facto RL environment execution engine. Example runs show that it takes only 5 minutes to train Atari Pong and MuJoCo Ant, both on a laptop. EnvPool has already been open-sourced at https://github.com/sail-sg/envpool.

在開發強化學習(RL)訓練系統方面已經取得了重大進展。過去的工作,如IMPALA、Apex、Seed RL、Sample Factory等,旨在提高系統的整體吞吐量。在本文中,我們試圖解決RL訓練系統中的一個常見瓶頸,即並行環境執行,這往往是整個系統中最慢的部分,但卻很少受到關注。通過對RL環境的並行化設計,我們在不同的硬體設定上提高了RL環境的模擬速度,從膝上型電腦、普通的工作站到NVIDIA DGX-A100這樣的高階機器。在高階機器上,EnvPool在Atari環境下的執行速度達到每秒100萬幀,在MuJoCo環境下達到每秒300萬幀。在膝上型電腦上執行時,EnvPool的速度是Python子程序的2.8倍。此外,與現有RL訓練庫的極大相容性已在開源社群得到證明,包括CleanRL、rl_games、DeepMind Acme等。最後,EnvPool允許研究人員以更快的速度迭代他們的想法,並具有成為事實上的RL環境執行引擎的巨大潛力。執行例項表明,在膝上型電腦上訓練Atari Pong和MuJoCo Ant只需要5分鐘。EnvPool已經在https://github.com/sail-sg/envpool 上開源了。

論文:How Well Do Sparse Imagenet Models Transfer?

論文標題:How Well Do Sparse Imagenet Models Transfer?

論文時間:CVPR 2022

所屬領域:計算機視覺

對應任務:Transfer Learning,遷移學習

論文地址https://arxiv.org/abs/2111.13445

程式碼實現https://github.com/neuralmagic/deepsparse

論文作者:Eugenia Iofinova, Alexandra Peste, Mark Kurtz, Dan Alistarh

論文簡介:Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" specialized datasets./遷移學習是一種經典範式,通過這種正規化,在大型 "上游 "資料集上預訓練的模型被調整為在 "下游 "專門資料集上產生良好的結果。

論文摘要:Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" specialized datasets. Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned - that is, compressed by sparsifying their connections. We consider transfer using unstructured pruned models obtained by applying several state-of-the-art pruning methods, including magnitude-based, second-order, re-growth, lottery-ticket, and regularization approaches, in the context of twelve standard transfer tasks. In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups. At the same time, we observe and analyze significant differences in the behaviour of different pruning methods.

遷移學習是一個經典的正規化,通過對大型 "上游 "資料集的預訓練,使模型在 "下游 "專業資料集上產生良好的結果。一般來說,在 "上游 "資料集上更準確的模型往往能在 "下游 "提供更好的遷移精度。在這項工作中,我們在ImageNet資料集上訓練的卷積神經網路(CNN)的背景下對這一現象進行了深入調查,這些卷積神經網路已經被修剪過,也就是說,通過疏散連線來壓縮模型。我們考慮使用非結構化的修剪模型進行遷移,這些模型是通過應用幾種最先進的修剪方法獲得的,包括基於量級的、二階的、再生長的、抽籤的和正則化的方法,在12個標準遷移任務的背景下。簡而言之,我們的研究表明,稀疏模型可以匹配甚至超越密集模型的遷移效能,即使是在高稀疏度的情況下,而且,在這樣做的同時,可以導致顯著的推理甚至訓練速度的提高。同時,我們觀察並分析了不同剪枝方法的行為的顯著差異。

我們是 ShowMeAI,致力於傳播AI優質內容,分享行業解決方案,用知識加速每一次技術成長!點選檢視 歷史文章列表,在公眾號內訂閱話題 #ShowMeAI資訊日報,可接收每日最新推送。點選 專題合輯&電子月刊 快速瀏覽各專題全集

「其他文章」