人工智慧 | ShowMeAI資訊日報 #2022.06.30

語言: CN / TW / HK

持續創作,加速成長!這是我參與「掘金日新計劃 · 6 月更文挑戰」的第32天,點選檢視活動詳情

ShowMeAI日報系列全新升級!覆蓋AI人工智慧 工具&框架 | 專案&程式碼 | 博文&分享 | 資料&資源 | 研究&論文 等方向。點選檢視 歷史文章列表,在公眾號內訂閱話題 #ShowMeAI資訊日報,可接收每日最新推送。點選 專題合輯&電子月刊 快速瀏覽各專題全集。

1.工具&框架

工具框架:flair - 整合最先進NLP技術的簡單框架(Python)

tags: [NLP技術,NLP應用]

'flair - A very simple framework for state-of-the-art NLP' by Zalando Research

GitHub: http://github.com/flairNLP/flair

工具庫:cleanlab - 機器學習資料集錯誤自動發現修復工具包

tags: [機器學習,資料集錯誤,錯誤修復]

'cleanlab - The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.'

GitHub: http://github.com/cleanlab/cleanlab

工具庫:OpenFold - AlphaFold 2的PyTorch版開源復現

tags: [AlphaFold 2,pytorch]

'OpenFold - Trainable PyTorch reproduction of AlphaFold 2' by AQ Laboratory

GitHub: http://github.com/aqlaboratory/openfold

工具庫:darts - Python時序處理與預測庫

tags: [時間序列]

'darts - A python library for easy manipulation and forecasting of time series.' by Unit8 SA

GitHub: http://github.com/unit8co/darts

工具庫:RapidOCR - 基於PaddleOCR & OnnxRuntime的跨平臺OCR庫

tags: [OCR,跨平臺]

'RapidOCR (捷智OCR) - A cross platform OCR Library based on PaddleOCR & OnnxRuntime'

GitHub: http://github.com/RapidAI/RapidOCR

2.博文&分享

分享:讀博申請攻略

Tutorial on PhD Application' by Lijin Zhang

GitHub: http://github.com/zhanglj37/Tutorial-on-PhD-Application

課程:Go語言入門與進階課程

'Go Course - Master the fundamentals and advanced features of the Go programming language' by Karan Pratap Singh

GitHub: http://github.com/karanpratapsingh/go-course

3.資料&資源

資源列表:時序AI相關資源大列表

'AI for Time Series (AI4TS) Papers, Tutorials, and Surveys - A professional list of Papers, Tutorials, and Surveys on AI for Time Series in top AI conferences and journals.' by Qingsong Wen

GitHub: http://github.com/qingsongedu/awesome-AI-for-time-series-papers

資源列表:弱監督語義分割論文彙總

'Awesome Weakly Supervised Semantic Segmentation - A comprehensive list of weakly supervised semantic segmentation (WSSS) works from 2014 to 2022.' by Xiaojian Zhong

GitHub: http://github.com/xiaojianzhong/awesome-weakly-supervised-semantic-segmentation

4.研究&論文

公眾號後臺回覆關鍵字 日報,免費獲取整理好的6月論文合輯。

論文:POGEMA: Partially Observable Grid Environment for Multiple Agents

論文標題:POGEMA: Partially Observable Grid Environment for Multiple Agents

論文時間:22 Jun 2022

論文地址http://arxiv.org/abs/2206.10944

程式碼實現http://github.com/airi-institute/pogema

論文作者:Alexey Skrynnik, Anton Andreychuk, Konstantin Yakovlev, Aleksandr I. Panov

論文簡介:We introduce POGEMA (http://github. com/AIRI-Institute/pogema) a sandbox for challenging partially observable multi-agent pathfinding (PO-MAPF) problems ./我們推出的POGEMA(http://github. com/AIRI-Institute/pogema)是一個用於挑戰部分可觀察的多Agent尋路(PO-MAPF)問題的沙盒。

論文摘要:We introduce POGEMA (http://github.com/AIRI-Institute/pogema) a sandbox for challenging partially observable multi-agent pathfinding (PO-MAPF) problems . This is a grid-based environment that was specifically designed to be a flexible, tunable and scalable benchmark. It can be tailored to a variety of PO-MAPF, which can serve as an excellent testing ground for planning and learning methods, and their combination, which will allow us to move towards filling the gap between AI planning and learning.

我們推出了POGEMA(http://github.com/AIRI-Institute/pogema),這是一個用於挑戰部分可觀察的多代理尋路(PO-MAPF)問題的沙盒。這是一個基於網格的環境,專門設計成一個靈活、可調整和可擴充套件的基準。它可以為各種PO-MAPF量身定做,這可以作為規劃和學習方法的一個很好的試驗場,以及它們的組合,這將使我們能夠朝著填補人工智慧規劃和學習之間的空白前進。

論文:Plotly-Resampler: Effective Visual Analytics for Large Time Series

論文標題:Plotly-Resampler: Effective Visual Analytics for Large Time Series

論文時間:17 Jun 2022

所屬領域:時間序列

對應任務:Data Visualization,Time Series,Time Series Analysis,資料視覺化,時間序列,時間序列分析

論文地址http://arxiv.org/abs/2206.08703

程式碼實現http://github.com/predict-idlab/plotly-resampler

論文作者:Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost, Sofie Van Hoecke

論文簡介:We observe that open source Python visualization toolkits empower data scientists in most visual analytics tasks, but lack the combination of scalability and interactivity to realize effective time series visualization./我們觀察到,開源的Python視覺化工具包使資料科學家能夠完成大多數視覺化分析任務,但缺乏實現有效時間序列視覺化的可擴充套件性和互動性的組合工具。

論文摘要:Visual analytics is arguably the most important step in getting acquainted with your data. This is especially the case for time series, as this data type is hard to describe and cannot be fully understood when using for example summary statistics. To realize effective time series visualization, four requirements have to be met; a tool should be (1) interactive, (2) scalable to millions of data points, (3) integrable in conventional data science environments, and (4) highly configurable. We observe that open source Python visualization toolkits empower data scientists in most visual analytics tasks, but lack the combination of scalability and interactivity to realize effective time series visualization. As a means to facilitate these requirements, we created Plotly-Resampler, an open source Python library. Plotly-Resampler is an add-on for Plotly's Python bindings, enhancing line chart scalability on top of an interactive toolkit by aggregating the underlying data depending on the current graph view. Plotly-Resampler is built to be snappy, as the reactivity of a tool qualitatively affects how analysts visually explore and analyze data. A benchmark task highlights how our toolkit scales better than alternatives in terms of number of samples and time series. Additionally, Plotly-Resampler's flexible data aggregation functionality paves the path towards researching novel aggregation techniques. Plotly-Resampler's integrability, together with its configurability, convenience, and high scalability, allows to effectively analyze high-frequency data in your day-to-day Python environment.

視覺化分析可以說是熟悉資料的最重要步驟。這對於時間序列來說尤其如此,因為這種資料型別很難描述,在使用例如彙總統計時無法完全理解。為了實現有效的時間序列視覺化,必須滿足四個要求;一個工具應該是(1)互動的,(2)可擴充套件到數百萬的資料點,(3)可整合到傳統的資料科學環境中,以及(4)高度可配置。我們觀察到,開源的Python視覺化工具包使資料科學家能夠完成大多數視覺化分析任務,但缺乏可擴充套件性和互動性的結合,無法實現有效的時間序列視覺化。作為促進這些要求的一種手段,我們建立了Plotly-Resampler,一個開源的Python庫。Plotly-Resampler是Plotly的Python繫結的一個外掛,通過根據當前的圖形檢視聚合底層資料,在互動式工具包的基礎上增強線圖的可擴充套件性。Plotly-Resampler的建立是為了讓它更敏捷,因為一個工具的反應性從質量上影響了分析師對資料的視覺探索和分析。一個基準任務強調了我們的工具包在樣本數和時間序列方面的擴充套件性優於其他方案。此外,Plotly-Resampler靈活的資料聚合功能為研究新型聚合技術鋪平了道路。Plotly-Resampler的可整合性,加上它的可配置性、便利性和高可擴充套件性,可以在日常的Python環境中有效地分析高頻資料。

論文:Sequencer: Deep LSTM for Image Classification

論文標題:Sequencer: Deep LSTM for Image Classification

論文時間:4 May 2022

所屬領域:計算機視覺

對應任務:Classification,Domain Generalization,Image Classification,Inductive Bias,Natural Language Processing,影象分類,域泛化,影象分類,歸納偏置,自然語言處理

論文地址http://arxiv.org/abs/2205.01972

程式碼實現http://github.com/rwightman/pytorch-image-models , http://github.com/okojoalg/sequencer , http://github.com/timeseriesAI/tsai , http://github.com/liuruiyang98/Jittor-MLP

論文作者:Yuki Tatsunami, Masato Taki

論文簡介:Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues./在這裡,我們提出了Sequencer,一個新穎的、有競爭力的架構,可以替代ViT,為這些問題提供一個新的視角。

論文摘要:In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues. Unlike ViTs, Sequencer models long-range dependencies using LSTMs rather than self-attention layers. We also propose a two-dimensional version of Sequencer module, where an LSTM is decomposed into vertical and horizontal LSTMs to enhance performance. Despite its simplicity, several experiments demonstrate that Sequencer performs impressively well: Sequencer2D-L, with 54M parameters, realizes 84.6% top-1 accuracy on only ImageNet-1K. Not only that, we show that it has good transferability and the robust resolution adaptability on double resolution-band.

在最近的計算機視覺研究中,視覺Transformer(ViT)的出現迅速革新了各種架構設計工作。ViT利用自然語言處理中發現的自我注意實現了最先進的影象分類效能,而MLP-Mixer利用簡單的多層感知器實現了具有競爭力的效能。相比之下,一些研究也表明,精心重新設計的卷積神經網路(CNN)可以實現與ViT相媲美的先進效能,而無需藉助這些新的想法。在此背景下,人們對什麼樣的歸納偏向適合於計算機視覺的興趣越來越大。在這裡,我們提出了Sequencer,一個新穎的、有競爭力的架構,可以替代ViT,為這些問題提供一個新的視角。與ViTs不同,Sequencer使用LSTMs而不是自我注意層來模擬長距離的依賴關係。我們還提出了一個二維版本的Sequencer模組,其中一個LSTM被分解成垂直和水平LSTM以提高效能。儘管它很簡單,但一些實驗證明Sequencer的表現令人驚訝。Sequencer2D-L,有54M個引數,僅在ImageNet-1K上就實現了84.6%的最高精度。不僅如此,我們還表明它具有良好的可轉移性和對雙解析度段的強大的解析度適應性。

論文:Re-parameterizing Your Optimizers rather than Architectures

論文標題:Re-parameterizing Your Optimizers rather than Architectures

論文時間:30 May 2022

所屬領域:機器學習

對應任務:優化演算法

論文地址http://arxiv.org/abs/2205.15242

程式碼實現http://github.com/dingxiaoh/repoptimizers

論文作者:Xiaohan Ding, Honghao Chen, Xiangyu Zhang, Kaiqi Huang, Jungong Han, Guiguang Ding

論文簡介:In this paper, we propose a novel paradigm of incorporating model-specific prior knowledge into optimizers and using them to train generic (simple) models./在本文中,我們提出了一種新穎的正規化,即把特定模型的先驗知識納入優化器,並利用它們來訓練通用(簡單)模型。

論文摘要:The well-designed structures in neural networks reflect the prior knowledge incorporated into the models. However, though different models have various priors, we are used to training them with model-agnostic optimizers (e.g., SGD). In this paper, we propose a novel paradigm of incorporating model-specific prior knowledge into optimizers and using them to train generic (simple) models. As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper-parameters, which is referred to as Gradient Re-parameterization, and the optimizers are named RepOptimizers. For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with the recent well-designed models. From a practical perspective, RepOpt-VGG is a favorable base model because of its simple structure, high inference speed and training efficiency. Compared to Structural Re-parameterization, which adds priors into models via constructing extra training-time structures, RepOptimizers require no extra forward/backward computations and solve the problem of quantization. The code and models are publicly available at http://github.com/dingxiaoh/repoptimizers

神經網路中精心設計的結構反映了納入模型中的先驗知識。然而,雖然不同的模型有不同的先驗知識,但我們習慣於用與模型無關的優化器(如SGD)來訓練它們。在本文中,我們提出了一種新的正規化,將特定模型的先驗知識納入優化器,並使用它們來訓練通用(簡單)模型。作為一種實現方式,我們提出了一種新的方法,通過根據一組特定模型的超引數修改梯度來增加先驗知識,這被稱為梯度再引數化,優化器被稱為RepOptimizers。為了使模型結構極其簡單,我們把重點放在一個VGG風格的普通模型上,並展示了這樣一個用RepOptimizer訓練的簡單模型,我們稱之為RepOpt-VGG,其效能與最近設計好的模型相當。從實用的角度來看,RepOpt-VGG是一個有利的基礎模型,因為它結構簡單,推理速度快,訓練效率高。與結構性重新引數化相比,結構性重新引數化是通過構建額外的訓練時結構將先驗因素加入到模型中,RepOptimizers不需要額外的前向/後向計算,並解決了量化問題。程式碼和模型可在 http://github.com/dingxiaoh/repoptimizers 公開。

論文:WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery

論文標題:WALT: Watch and Learn 2D Amodal Representation From Time-Lapse Imagery

論文時間:CVPR 2022

所屬領域:計算機視覺

對應任務:Amodal Instance Segmentation,object-detection,Object Detection,例項分割,物體檢測,目標檢測

論文地址http://openaccess.thecvf.com/content/CVPR2022/papers/Reddy_WALT_Watch_and_Learn_2D_Amodal_Representation_From_Time-Lapse_Imagery_CVPR_2022_paper.pdf

程式碼實現http://github.com/dineshreddy91/WALT

論文作者:N. Dinesh Reddy, Robert Tamburo, Srinivasa G. Narasimhan

論文簡介:Labeled real data of occlusions is scarce (even in large datasets) and synthetic data leaves a domain gap, making it hard to explicitly model and learn occlusions./標記的閉塞物真實資料很少(即使在大型資料集中),而合成數據留下了領域空白,因此很難明確地對閉塞物進行建模和學習。

論文摘要:Current methods for object detection, segmentation, and tracking fail in the presence of severe occlusions in busy urban environments. Labeled real data of occlusions is scarce (even in large datasets) and synthetic data leaves a domain gap, making it hard to explicitly model and learn occlusions. In this work, we present the best of both the real and synthetic worlds for automatic occlusion supervision using a large readily available source of data: time-lapse imagery from stationary webcams observing street intersections over weeks, months, or even years. We introduce a new dataset, Watch and Learn Time-lapse (WALT), consisting of 12 (4K and 1080p) cameras capturing urban environments over a year. We exploit this real data in a novel way to automatically mine a large set of unoccluded objects and then composite them in the same views to generate occlusions. This longitudinal self-supervision is strong enough for an amodal network to learn object-occluder-occluded layer representations. We show how to speed up the discovery of unoccluded objects and relate the confidence in this discovery to the rate and accuracy of training occluded objects. After watching and automatically learning for several days, this approach shows significant performance improvement in detecting and segmenting occluded people and vehicles, over human-supervised amodal approaches.

目前用於物體檢測、分割和跟蹤的方法在繁忙的城市環境中存在嚴重的遮擋現象時失敗了。標記的閉塞物的真實資料很少(甚至在大型資料集中),而合成數據則留下了一個領域的空白,因此很難對閉塞物進行明確的建模和學習。在這項工作中,我們提出了真實世界和合成世界中最好的自動閉塞監督,使用了大量現成的資料來源:來自固定網路攝像機的延時影象,在幾周、幾個月甚至幾年內觀察街道交叉口。我們引入了一個新的資料集--觀察和學習延時(WALT),由12個(4K和1080p)攝像頭組成,在一年內捕捉城市環境。我們以一種新穎的方式利用這些真實的資料,自動挖掘出一大批未被遮擋的物體,然後在同一檢視中對它們進行合成,生成遮擋物。這種縱向的自我監督足以讓一個模態網路學習物體-閉塞物-閉塞層的表徵。我們展示瞭如何加快發現未被遮擋的物體,並將這種發現的信心與訓練被遮擋物體的速度和準確性聯絡起來。經過幾天的觀察和自動學習,這種方法在檢測和分割被遮擋的人和車輛方面顯示出明顯的效能改進,超過了人類監督的模數方法。

論文:AiTLAS: Artificial Intelligence Toolbox for Earth Observation

論文標題:AiTLAS: Artificial Intelligence Toolbox for Earth Observation

論文時間:21 Jan 2022

所屬領域:計算機視覺

對應任務:Semantic Segmentation,Type prediction,語義分割,型別預估

論文地址http://arxiv.org/abs/2201.08789

程式碼實現http://github.com/biasvariancelabs/aitlas

論文作者:Ivica Dimitrovski, Ivan Kitanovski, Panče Panov, Nikola Simidjievski, Dragi Kocev

論文簡介:The AiTLAS toolbox (Artificial Intelligence Toolbox for Earth Observation) includes state-of-the-art machine learning methods for exploratory and predictive analysis of satellite imagery as well as repository of AI-ready Earth Observation (EO) datasets./AiTLAS工具箱(地球觀測人工智慧工具箱)包括最先進的機器學習方法,用於衛星影象的探索和預測分析,以及可用於人工智慧的地球觀測(EO)資料集的儲存庫。

論文摘要:The AiTLAS toolbox (Artificial Intelligence Toolbox for Earth Observation) includes state-of-the-art machine learning methods for exploratory and predictive analysis of satellite imagery as well as repository of AI-ready Earth Observation (EO) datasets. It can be easily applied for a variety of Earth Observation tasks, such as land use and cover classification, crop type prediction, localization of specific objects (semantic segmentation), etc. The main goal of AiTLAS is to facilitate better usability and adoption of novel AI methods (and models) by EO experts, while offering easy access and standardized format of EO datasets to AI experts which further allows benchmarking of various existing and novel AI methods tailored for EO data.

AiTLAS工具箱(地球觀測人工智慧工具箱)包括最先進的機器學習方法,用於對衛星影象進行探索性和預測性分析,以及可用於人工智慧的地球觀測(EO)資料集庫。它可以很容易地應用於各種地球觀測任務,如土地利用和覆蓋分類、作物型別預測、特定物體的定位(語義分割)等。AiTLAS的主要目標是促進EO專家更好地使用和採用新的人工智慧方法(和模型),同時為人工智慧專家提供方便的EO資料集和標準化的格式,這進一步允許為EO資料定製的各種現有和新的人工智慧方法進行基準測試。

論文:MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction

論文標題:MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction

論文時間:17 Apr 2022

所屬領域:計算機視覺

對應任務:Image Restoration,Spectral Reconstruction,Spectral Super-Resolution,影象修復,光譜重建,光譜超解析度

論文地址http://arxiv.org/abs/2204.07908

程式碼實現http://github.com/caiyuanhao1998/MST-plus-plus

論文作者:Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Zhang, Hanspeter Pfister, Radu Timofte, Luc van Gool

論文簡介:Existing leading methods for spectral reconstruction (SR) focus on designing deeper or wider convolutional neural networks (CNNs) to learn the end-to-end mapping from the RGB image to its hyperspectral image (HSI)./現有的光譜重建(SR)的領先方法側重於設計更深或更廣的卷積神經網路(CNN)來學習從RGB影象到其高光譜影象(HSI)的端到端對映。

論文摘要:Existing leading methods for spectral reconstruction (SR) focus on designing deeper or wider convolutional neural networks (CNNs) to learn the end-to-end mapping from the RGB image to its hyperspectral image (HSI). These CNN-based methods achieve impressive restoration performance while showing limitations in capturing the long-range dependencies and self-similarity prior. To cope with this problem, we propose a novel Transformer-based method, Multi-stage Spectral-wise Transformer (MST++), for efficient spectral reconstruction. In particular, we employ Spectral-wise Multi-head Self-attention (S-MSA) that is based on the HSI spatially sparse while spectrally self-similar nature to compose the basic unit, Spectral-wise Attention Block (SAB). Then SABs build up Single-stage Spectral-wise Transformer (SST) that exploits a U-shaped structure to extract multi-resolution contextual information. Finally, our MST++, cascaded by several SSTs, progressively improves the reconstruction quality from coarse to fine. Comprehensive experiments show that our MST++ significantly outperforms other state-of-the-art methods. In the NTIRE 2022 Spectral Reconstruction Challenge, our approach won the First place. Code and pre-trained models are publicly available at http://github.com/caiyuanhao1998/MST-plus-plus

現有的光譜重建(SR)的領先方法側重於設計更深或更廣的卷積神經網路(CNN)來學習從RGB影象到其高光譜影象(HSI)的端對端對映。這些基於CNN的方法取得了令人印象深刻的修復效能,但在捕捉長距離依賴性和自相似性的先驗方面顯示出侷限性。為了應對這一問題,我們提出了一種新的基於Transformer的方法,即多級逐光譜Transformer(MST++),用於高效的光譜重建。特別是,我們採用了基於HSI空間稀疏而光譜自相似性質的逐光譜多頭自注意(S-MSA)來組成基本單元,即逐光譜注意塊(SAB)。然後,SABs建立了單級逐頻譜Transformer(SST),利用U形結構來提取多解析度的上下文資訊。最後,我們的MST++,通過幾個SST的級聯,逐步提高重建質量,從粗到細。綜合實驗表明,我們的MST++明顯優於其他最先進的方法。在NTIRE 2022年的光譜重建挑戰賽中,我們的方法贏得了第一名。程式碼和預訓練的模型在 http://github.com/caiyuanhao1998/MST-plus-plus 公開。

論文:Ensembling Off-the-shelf Models for GAN Training

論文標題:Ensembling Off-the-shelf Models for GAN Training

論文時間:CVPR 2022

所屬領域:計算機視覺

對應任務:Image Generation,影象生成

論文地址http://arxiv.org/abs/2112.09130

程式碼實現http://github.com/nupurkmr9/vision-aided-gan

論文作者:Nupur Kumari, Richard Zhang, Eli Shechtman, Jun-Yan Zhu

論文簡介:Can the collective "knowledge" from a large bank of pretrained vision models be leveraged to improve GAN training?/大型預訓練視覺模型庫的集體 "知識 "能否被用來改善GAN訓練?

論文摘要:The advent of large-scale training has produced a cornucopia of powerful visual recognition models. However, generative models, such as GANs, have traditionally been trained from scratch in an unsupervised manner. Can the collective "knowledge" from a large bank of pretrained vision models be leveraged to improve GAN training? If so, with so many models to choose from, which one(s) should be selected, and in what manner are they most effective? We find that pretrained computer vision models can significantly improve performance when used in an ensemble of discriminators. Notably, the particular subset of selected models greatly affects performance. We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings, choosing the most accurate model, and progressively adding it to the discriminator ensemble. Interestingly, our method can improve GAN training in both limited data and large-scale settings. Given only 10k training samples, our FID on LSUN Cat matches the StyleGAN2 trained on 1.6M images. On the full dataset, our method improves FID by 1.5x to 2x on cat, church, and horse categories of LSUN.

大規模訓練的出現產生了一個強大的視覺識別模型的寶庫。然而,生成模型,如GANs,傳統上是以無監督的方式從頭開始訓練。能否利用來自大型預訓練視覺模型庫的集體 "知識 "來改善GAN訓練?如果可以的話,有這麼多模型可供選擇,應該選擇哪一個(幾個),以及它們以何種方式最有效?我們發現,預訓練的計算機視覺模型在用於判別器的集合時可以顯著提高效能。值得注意的是,所選模型的特定子集對效能影響很大。我們提出了一個有效的選擇機制,通過探測預訓練模型嵌入中真假樣本之間的線性可分離性,選擇最準確的模型,並逐步將其加入到判別器集合中。有趣的是,我們的方法在有限的資料和大規模的環境中都能改善GAN的訓練。鑑於只有1萬個訓練樣本,我們在LSUN Cat上的FID與在1.6M影象上訓練的StyleGAN2匹配。在完整的資料集上,我們的方法在LSUN的貓、教堂和馬的類別上提高了1.5倍到2倍的FID。

我們是 ShowMeAI,致力於傳播AI優質內容,分享行業解決方案,用知識加速每一次技術成長!點選檢視 歷史文章列表,在公眾號內訂閱話題 #ShowMeAI資訊日報,可接收每日最新推送。點選 專題合輯&電子月刊 快速瀏覽各專題全集。

「其他文章」