數據科學家賺多少?數據全分析與可視化 ⛵
highlight: a11y-dark
- 💡 作者:韓信子@ShowMeAI
- 📘 數據分析實戰系列:https://www.showmeai.tech/tutorials/40
- 📘 AI 崗位&攻略系列:https://www.showmeai.tech/tutorials/47
- 📘 本文地址:https://www.showmeai.tech/article-detail/402
- 📢 聲明:版權所有,轉載請聯繫平台與作者並註明出處
- 📢 收藏ShowMeAI查看更多精彩內容
💡 引言
數據科學在互聯網、醫療、電信、零售、體育、航空、藝術等各個領域仍然越來越受歡迎。在 📘Glassdoor的美國最佳職位列表中,數據科學職位排名第三,2022 年有近 10,071 個職位空缺。
除了數據獨特的魅力,數據科學相關崗位的薪資也備受關注,在本篇內容中,ShowMeAI會基於數據對下述問題進行分析:
- 數據科學中薪水最高的工作是什麼?
- 哪個國家的薪水最高,機會最多?
- 典型的薪資範圍是多少?
- 工作水平對數據科學家有多重要?
- 數據科學,全職vs自由職業者
- 數據科學領域薪水最高的工作是什麼?
- 數據科學領域平均薪水最高的工作是什麼?
- 數據科學專業的最低和最高工資
- 招聘數據科學專業人員的公司規模如何?
- 工資是不是跟公司規模有關?
- WFH(遠程辦公)和 WFO 的比例是多少?
- 數據科學工作的薪水每年如何增長?
- 如果有人正在尋找與數據科學相關的工作,你會建議他在網上搜索什麼?
- 如果你有幾年初級員工的經驗,你應該考慮跳槽到什麼規模的公司?
💡 數據説明
我們本次用到的數據集是 🏆數據科學工作薪水數據集,大家可以通過 ShowMeAI 的百度網盤地址下載。
🏆 實戰數據集下載(百度網盤):公眾號『ShowMeAI研究中心』回覆『實戰』,或者點擊 這裏 獲取本文 [37]基於pandasql和plotly的數據科學家薪資分析與可視化 『ds_salaries數據集』
⭐ ShowMeAI官方GitHub:https://github.com/ShowMeAI-Hub
數據集包含 11 列,對應的名稱和含義如下:
| 參數 | 含義 | | :---- | :---- | | work_year | 支付工資的年份 | | experience_level : 發薪時的經驗等級 | | employment_type | 就業類型 | | job_title | 崗位名稱 | | salary | 支付的總工資總額 | | salary_currency | 支付的薪水的貨幣 | | salary_in_usd | 支付的標準化工資(美元) | | employee_residence | 員工的主要居住國家 | | remote_ratio | 遠程完成的工作總量 | | company_location | 僱主主要辦公室所在的國家/地區 | | company_size | 根據員工人數計算的公司規模 |
本篇分析使用到Pandas和SQL,歡迎大家閲讀ShowMeAI的數據分析教程和對應的工具速查表文章,系統學習和動手實踐:
💡 導入工具庫
我們先導入需要使用的工具庫,我們使用pandas讀取數據,使用 Plotly 和 matplotlib 進行可視化。並且我們在本篇中會使用 SQL 進行數據分析,我們這裏使用到了 📘pandasql 工具庫。
```python
For loading data
import pandas as pd import numpy as np
For SQL queries
import pandasql as ps
For ploting graph / Visualization
import plotly.graph_objects as go import plotly.express as px from plotly.offline import iplot import plotly.figure_factory as ff
import plotly.io as pio import seaborn as sns import matplotlib.pyplot as plt
To show graph below the code or on same notebook
from plotly.offline import init_notebook_mode init_notebook_mode(connected=True)
To convert country code to country name
import country_converter as coco
import warnings warnings.filterwarnings('ignore') ```
💡 加載數據集
我們下載的數據集是 CSV 格式的,所以我們可以使用 read_csv 方法來讀取我們的數據集。
```python
Loading data
salaries = pd.read_csv('ds_salaries.csv') ```
要查看前五個記錄,我們可以使用 salaries.head()
方法。
藉助 pandasql
完成同樣的任務是這樣的:
```python
Function query to execute SQL queries
def query(query): return ps.sqldf(query)
Showing Top 5 rows of data
query(""" SELECT * FROM salaries LIMIT 5 """) ```
輸出:
💡 數據預處理
我們數據集中的第1列“Unnamed: 0”是沒有用的,在分析之前我們把它剔除:
python
salaries = salaries.drop('Unnamed: 0', axis = 1)
我們查看一下數據集中缺失值情況:
python
salaries.isna().sum()
輸出:
work_year 0
experience_level 0
employment_type 0
job_title 0
salary 0
salary_currency 0
salary_in_usd 0
employee_residence 0
remote_ratio 0
company_location 0
company_size 0
dtype: int64
我們的數據集中沒有任何缺失值,因此不用做缺失值處理,employee_residence
和 company_location
使用的是短國家代碼。我們映射替換為國家的全名以便於理解:
```python
Converting countries code to country names
salaries["employee_residence"] = coco.convert(names=salaries["employee_residence"], to="name") salaries["company_location"] = coco.convert(names=salaries["company_location"], to="name") ```
這個數據集中的experience_level代表不同的經驗水平,使用的是如下縮寫:
- CN: Entry Level (入門級)
- ML:Mid level (中級)
- SE:Senior Level (高級)
- EX:Expert Level (資深專家級)
為了更容易理解,我們也把這些縮寫替換為全稱。
```python
Replacing values in column - experience_level :
salaries['experience_level'] = query("""SELECT REPLACE( REPLACE( REPLACE( REPLACE( experience_level, 'MI', 'Mid level'), 'SE', 'Senior Level'), 'EN', 'Entry Level'), 'EX', 'Expert Level') FROM salaries""") ```
同樣的方法,我們對工作形式也做全稱替換
- FT: Full Time (全職)
- PT: Part Time (兼職)
- CT:Contract (合同制)
- FL:Freelance (自由職業)
```python
Replacing values in column - experience_level :
salaries['employment_type'] = query("""SELECT REPLACE( REPLACE( REPLACE( REPLACE( employment_type, 'PT', 'Part Time'), 'FT', 'Full Time'), 'FL', 'Freelance'), 'CT', 'Contract') FROM salaries""") ```
數據集中公司規模字段處理如下:
- S:Small (小型)
- M:Medium (中型)
- L:Large (大型)
```python
Replacing values in column - company_size :
salaries['company_size'] = query("""SELECT REPLACE( REPLACE( REPLACE( company_size, 'M', 'Medium'), 'L', 'Large'), 'S', 'Small') FROM salaries""") ```
我們對遠程比率字段也做一些處理,以便更好理解
```python
Replacing values in column - remote_ratio :
salaries['remote_ratio'] = query("""SELECT REPLACE( REPLACE( REPLACE( remote_ratio, '100', 'Fully Remote'), '50', 'Partially Remote'), '0', 'Non Remote Work') FROM salaries""") ```
這是預處理後的最終輸出。
💡 數據分析&可視化
💦 數據科學中薪水最高的工作是什麼?
python
top10_jobs = query("""
SELECT job_title,
Count(*) AS job_count
FROM salaries
GROUP BY job_title
ORDER BY job_count DESC
LIMIT 10
""")
我們繪製條形圖以便更直觀理解:
```python data = go.Bar(x = top10_jobs['job_title'], y = top10_jobs['job_count'], text = top10_jobs['job_count'], textposition = 'inside', textfont = dict(size = 12, color = 'white'), marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'black', line_width = 1))
layout = go.Layout(title = {'text': "Top 10 Data Science Jobs", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'Job Title', tickmode = 'array'), yaxis = dict(title = 'Total'), width = 900, height = 600)
fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 數據科學職位的市場分佈
```python fig = px.pie(top10_jobs, values='job_count', names='job_title', color_discrete_sequence = px.colors.qualitative.Alphabet)
fig.update_layout(title = {'text': "Distribution of job positions", 'x':0.5, 'xanchor': 'center'}, width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 擁有最多數據科學工作的國家
```python top10_com_loc = query(""" SELECT company_location AS company, Count(*) AS job_count FROM salaries GROUP BY company ORDER BY job_count DESC LIMIT 10 """)
data = go.Bar(x = top10_com_loc['company'], y = top10_com_loc['job_count'], textfont = dict(size = 12, color = 'white'), marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'black', line_width = 1))
layout = go.Layout(title = {'text': "Top 10 Data Science Countries", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'Countries', tickmode = 'array'), yaxis = dict(title = 'Total'), width = 900, height = 600)
fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
從上圖中,我們可以看出美國在數據科學方面的工作機會最多。現在我們來看看世界各地的薪水。大家可以繼續運行代碼,查看可視化結果。
```python df = salaries df["company_country"] = coco.convert(names = salaries["company_location"], to = 'name_short')
temp_df = df.groupby('company_country')['salary_in_usd'].sum().reset_index() temp_df['salary_scale'] = np.log10(df['salary_in_usd'])
fig = px.choropleth(temp_df, locationmode = 'country names', locations = "company_country", color = "salary_scale", hover_name = "company_country", hover_data = temp_df[['salary_in_usd']], color_continuous_scale = 'Jet', )
fig.update_layout(title={'text':'Salaries across the World', 'xanchor': 'center','x':0.5}) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 平均工資(基於貨幣計算)
```python df = salaries[['salary_currency','salary_in_usd']].groupby(['salary_currency'], as_index = False).mean().set_index('salary_currency').reset_index().sort_values('salary_in_usd', ascending = False)
Selecting top 14
df = df.iloc[:14] fig = px.bar(df, x = 'salary_currency', y = 'salary_in_usd', color = 'salary_currency', color_discrete_sequence = px.colors.qualitative.Safe, )
fig.update_layout(title={'text':'Average salary as a function of currency', 'xanchor': 'center','x':0.5}, xaxis_title = 'Currency', yaxis_title = 'Mean Salary') fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
人們以美元賺取的收入最多,其次是瑞士法郎和新加坡元。
```python df = salaries[['company_country','salary_in_usd']].groupby(['company_country'], as_index = False).mean().set_index('company_country').reset_index().sort_values('salary_in_usd', ascending = False)
Selecting top 14
df = df.iloc[:14] fig = px.bar(df, x = 'company_country', y = 'salary_in_usd', color = 'company_country', color_discrete_sequence = px.colors.qualitative.Dark2, )
fig.update_layout(title = {'text': "Average salary as a function of company location", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'Company Location', tickmode = 'array'), yaxis = dict(title = 'Mean Salary'), width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 數據科學工作經驗水平分佈
```python job_exp = query(""" SELECT experience_level, Count(*) AS job_count FROM salaries GROUP BY experience_level ORDER BY job_count ASC """)
data = go.Bar(x = job_exp['job_count'], y = job_exp['experience_level'], orientation = 'h', text = job_exp['job_count'], marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'white', line_width = 2))
layout = go.Layout(title = {'text': "Jobs on Experience Levels", 'x':0.5, 'xanchor':'center'}, xaxis = dict(title='Total', tickmode = 'array'), yaxis = dict(title='Experience lvl'), width = 900, height = 600)
fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
從上圖可以看出,大多數數據科學都是 高級水平 ,專家級很少。
💦 數據科學工作就業類型分佈
```python job_emp = query(""" SELECT employment_type, COUNT(*) AS job_count FROM salaries GROUP BY employment_type ORDER BY job_count ASC """)
data = go.Bar(x = job_emp['job_count'], y = job_emp['employment_type'], orientation ='h',text = job_emp['job_count'], textposition ='outside', marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'white', line_width = 2))
layout = go.Layout(title = {'text': "Jobs on Employment Type", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title='Total', tickmode = 'array'), yaxis =dict(title='Emp Type lvl'), width = 900, height = 600)
fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
從上圖中,我們可以看到大多數數據科學家從事 全職工作 ,而合同工和自由職業者 則較少
💦 數據科學工作數量趨勢
```python job_year = query(""" SELECT work_year, COUNT(*) AS 'job count' FROM salaries GROUP BY work_year ORDER BY 'job count' DESC """)
data = go.Scatter(x = job_year['work_year'], y = job_year['job count'], marker = dict(size = 20, line_width = 1.5, line_color = 'white', color = px.colors.qualitative.Alphabet), line = dict(color = '#ED7D31', width = 4), mode = 'lines+markers')
layout = go.Layout(title = {'text' : "Data Science jobs Growth (2020 to 2022)", 'x' : 0.5, 'xanchor' : 'center'}, xaxis = dict(title = 'Year'), yaxis = dict(title = 'Jobs'), width = 900, height = 600)
fig = go.Figure(data = data, layout = layout) fig.update_xaxes(tickvals = ['2020','2021','2022']) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 數據科學工作薪水分佈
```python salary_usd = query(""" SELECT salary_in_usd FROM salaries """)
import matplotlib.pyplot as plt
plt.figure(figsize = (20, 8)) sns.set(rc = {'axes.facecolor' : '#f1e7d2', 'figure.facecolor' : '#f1e7d2'})
p = sns.histplot(salary_usd["salary_in_usd"], kde = True, alpha = 1, fill = True, edgecolor = 'black', linewidth = 1) p.axes.lines[0].set_color("orange") plt.title("Data Science Salary Distribution \n", fontsize = 25) plt.xlabel("Salary", fontsize = 18) plt.ylabel("Count", fontsize = 18) plt.show() ```
💦 薪酬最高的 10 大數據科學工作
```python salary_hi10 = query(""" SELECT job_title, MAX(salary_in_usd) AS salary FROM salaries GROUP BY salary ORDER BY salary DESC LIMIT 10 """)
data = go.Bar(x = salary_hi10['salary'], y = salary_hi10['job_title'], orientation = 'h', text = salary_hi10['salary'], textposition = 'inside', insidetextanchor = 'middle', textfont = dict(size = 13, color = 'black'), marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'black', line_width = 1))
layout = go.Layout(title = {'text': "Top 10 Highest paid Data Science Jobs", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'salary', tickmode = 'array'), yaxis = dict(title = 'Job Title'), width = 900, height = 600) fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
首席數據工程師 是數據科學領域的高薪工作。
💦 不同崗位平均薪資與排名
```python salary_av10 = query(""" SELECT job_title, ROUND(AVG(salary_in_usd)) AS salary FROM salaries GROUP BY job_title ORDER BY salary DESC LIMIT 10 """)
data = go.Bar(x = salary_av10['salary'], y = salary_av10['job_title'], orientation = 'h', text = salary_av10['salary'], textposition = 'inside', insidetextanchor = 'middle', textfont = dict(size = 13, color = 'white'), marker = dict(color = px.colors.qualitative.Alphabet, opacity = 0.9, line_color = 'white', line_width = 2))
layout = go.Layout(title = {'text': "Top 10 Average paid Data Science Jobs", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'salary', tickmode = 'array'), yaxis = dict(title = 'Job Title'), width = 900, height = 600) fig = go.Figure(data = data, layout = layout) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 數據科學薪資趨勢
```python salary_year = query(""" SELECT ROUND(AVG(salary_in_usd)) AS salary, work_year AS year FROM salaries GROUP BY year ORDER BY salary DESC """)
data = go.Scatter(x = salary_year['year'], y = salary_year['salary'], marker = dict(size = 20, line_width = 1.5, line_color = 'black', color = '#ED7D31'), line = dict(color = 'black', width = 4), mode = 'lines+markers')
layout = go.Layout(title = {'text' : "Data Science Salary Growth (2020 to 2022) ", 'x' : 0.5, 'xanchor' : 'center'}, xaxis = dict(title = 'Year'), yaxis = dict(title = 'Salary'), width = 900, height = 600)
fig = go.Figure(data = data, layout = layout) fig.update_xaxes(tickvals = ['2020','2021','2022']) fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 經驗水平&薪資
```python salary_exp = query(""" SELECT experience_level AS 'Experience Level', salary_in_usd AS Salary FROM salaries """)
fig = px.violin(salary_exp, x = 'Experience Level', y = 'Salary', color = 'Experience Level', box = True)
fig.update_layout(title = {'text': "Salary on Experience Level", 'xanchor': 'center','x':0.5}, xaxis = dict(title = 'Experience level'), yaxis = dict(title = 'salary', ticktext = [-300000, 0, 100000, 200000, 300000, 400000, 500000, 600000, 700000]), width = 900, height = 600)
fig.update_layout(paper_bgcolor= '#f1e7d2', plot_bgcolor = '#f1e7d2', showlegend = False) fig.show() ```
💦 不同經驗水平的薪資趨勢
```python tmp_df = salaries.groupby(['work_year', 'experience_level']).median() tmp_df.reset_index(inplace = True)
fig = px.line(tmp_df, x='work_year', y='salary_in_usd', color='experience_level', symbol="experience_level")
fig.update_layout(title = {'text': "Median Salary Trend By Experience Level", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'Working Year', tickvals = [2020, 2021, 2022], tickmode = 'array'), yaxis = dict(title = 'Salary'), width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
觀察 1. 在COVID-19大流行期間(2020 年至 2021 年),專家級員工薪資非常高,但是呈現部分下降趨勢。 2. 2021年以後專家級和高級職稱人員工資有所上漲。
💦 年份&薪資分佈
```python year_gp = salaries.groupby('work_year') hist_data = [year_gp.get_group(2020)['salary_in_usd'], year_gp.get_group(2021)['salary_in_usd'], year_gp.get_group(2022)['salary_in_usd']] group_labels = ['2020', '2021', '2022']
fig = ff.create_distplot(hist_data, group_labels, show_hist = False)
fig.update_layout(title = {'text': "Salary Distribution By Working Year", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'Salary'), yaxis = dict(title = 'Kernel Density'), width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 就業類型&薪資
```python salary_emp = query(""" SELECT employment_type AS 'Employment Type', salary_in_usd AS Salary FROM salaries """)
fig = px.box(salary_emp,x='Employment Type',y='Salary', color = 'Employment Type')
fig.update_layout(title = {'text': "Salary by Employment Type", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'Employment Type'), yaxis = dict(title = 'Salary'), width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 公司規模分佈
```python comp_size = query(""" SELECT company_size, COUNT(*) AS count FROM salaries GROUP BY company_size """)
import plotly.graph_objects as go data = go.Pie(labels = comp_size['company_size'], values = comp_size['count'].values, hoverinfo = 'label', hole = 0.5, textfont_size = 16, textposition = 'auto') fig = go.Figure(data = data)
fig.update_layout(title = {'text': "Company Size", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = ''), yaxis = dict(title = ''), width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 不同公司規模的經驗水平比例
```python df = salaries.groupby(['company_size', 'experience_level']).size() comp_s = np.round(df['Small'].values / df['Small'].values.sum(),2) comp_m = np.round(df['Medium'].values / df['Medium'].values.sum(),2) comp_l = np.round(df['Large'].values / df['Large'].values.sum(),2)
fig = go.Figure() categories = ['Entry Level', 'Expert Level','Mid level','Senior Level']
fig.add_trace(go.Scatterpolar( r = comp_s, theta = categories, fill = 'toself', name = 'Company Size S'))
fig.add_trace(go.Scatterpolar( r = comp_m, theta = categories, fill = 'toself', name = 'Company Size M'))
fig.add_trace(go.Scatterpolar( r = comp_l, theta = categories, fill = 'toself', name = 'Company Size L'))
fig.update_layout( polar = dict( radialaxis = dict(range = [0, 0.6])), showlegend = True, )
fig.update_layout(title = {'text': "Proportion of Experience Level In Different Company Sizes", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = ''), yaxis = dict(title = ''), width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 不同公司規模&工作薪資
```python salary_size = query(""" SELECT company_size AS 'Company size', salary_in_usd AS Salary FROM salaries """)
fig = px.box(salary_size, x='Company size', y = 'Salary', color = 'Company size')
fig.update_layout(title = {'text': "Salary by Company size", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'Company size'), yaxis = dict(title = 'Salary'), width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 WFH(遠程辦公)和 WFO 的比例
```python rem_type = query(""" SELECT remote_ratio, COUNT(*) AS total FROM salaries GROUP BY remote_ratio """)
data = go.Pie(labels = rem_type['remote_ratio'], values = rem_type['total'].values, hoverinfo = 'label', hole = 0.4, textfont_size = 18, textposition = 'auto')
fig = go.Figure(data = data)
fig.update_layout(title = {'text': "Remote Ratio", 'x':0.5, 'xanchor': 'center'}, width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 薪水受Remote Type影響程度
```python salary_remote = query(""" SELECT remote_ratio AS 'Remote type', salary_in_usd AS Salary From salaries """)
fig = px.box(salary_remote, x = 'Remote type', y = 'Salary', color = 'Remote type')
fig.update_layout(title = {'text': "Salary by Remote Type", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'Remote type'), yaxis = dict(title = 'Salary'), width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💦 不同經驗水平&遠程比率
```python exp_remote = salaries.groupby(['experience_level', 'remote_ratio']).count() exp_remote.reset_index(inplace = True)
fig = px.histogram(exp_remote, x = 'experience_level', y = 'work_year', color = 'remote_ratio', barmode = 'group', text_auto = True)
fig.update_layout(title = {'text': "Respondent Count In Different Experience Level Based on Remote Ratio", 'x':0.5, 'xanchor': 'center'}, xaxis = dict(title = 'Experience Level'), yaxis = dict(title = 'Number of Respondents'), width = 900, height = 600)
fig.update_layout(plot_bgcolor = '#f1e7d2', paper_bgcolor = '#f1e7d2') fig.show() ```
💡 分析結論
-
數據科學領域Top3多的職位是數據科學家、數據工程師和數據分析師。
-
數據科學工作越來越受歡迎。員工比例從2020年的11.9%增加到2022年的52.4%。
-
美國是數據科學公司最多的國家。
-
工資分佈的IQR在62.7k和150k之間。
-
在數據科學員工中,大多數是高級水平,而專家級則更少。
-
大多數數據科學員工都是全職工作,很少有合同工和自由職業者。
-
首席數據工程師是薪酬最高的數據科學工作。
-
數據科學的最低工資(入門級經驗)為4000美元,具有專家級經驗的數據科學的最高工資為60萬美元。
-
公司構成:53.7%中型公司,32.6%大型公司,13.7%小型數據科學公司。
-
工資也受公司規模影響,規模大的公司支付更高的薪水。
-
62.8%的數據科學是完全遠程工作,20.9%是非遠程工作,16.3%是部分遠程工作。
-
數據科學薪水隨時間和經驗積累而增長。
參考資料
- 📘 Glassdoor
- 📘 pandasql
- 📘 數據科學工作薪水數據集(Kaggle)
- 📘 圖解數據分析:從入門到精通系列教程:https://www.showmeai.tech/tutorials/33
- 📘 編程語言速查表 | SQL 速查表:https://www.showmeai.tech/article-detail/99
- 📘 數據科學工具庫速查表 | Pandas 速查表:https://www.showmeai.tech/article-detail/101
- 📘 數據科學工具庫速查表 | Matplotlib 速查表:https://www.showmeai.tech/article-detail/103
推薦閲讀
- 🌍 數據分析實戰系列 :https://www.showmeai.tech/tutorials/40
- 🌍 機器學習數據分析實戰系列:https://www.showmeai.tech/tutorials/41
- 🌍 深度學習數據分析實戰系列:https://www.showmeai.tech/tutorials/42
- 🌍 TensorFlow數據分析實戰系列:https://www.showmeai.tech/tutorials/43
- 🌍 PyTorch數據分析實戰系列:https://www.showmeai.tech/tutorials/44
- 🌍 NLP實戰數據分析實戰系列:https://www.showmeai.tech/tutorials/45
- 🌍 CV實戰數據分析實戰系列:https://www.showmeai.tech/tutorials/46
- 🌍 AI 面試題庫系列:https://www.showmeai.tech/tutorials/48
本文正在參加「金石計劃 . 瓜分6萬現金大獎」
- 感謝飛書放過幕布!100個GPT-4實戰案例;GPT-4免費平替Poe;AI繪畫新手指南之SD篇;new Bing靠譜教程 | ShowMeAI日報
- whylogs工具庫的工業實踐!機器學習模型流程與效果監控 ⛵
- 脈脈瘋傳!2023年程序員生存指南;多款prompt效率加倍工具;提示工程師最全祕籍;AI裁員正在發生 | ShowMeAI日報
- 中國風?古典系?AI中文繪圖創作嚐鮮!⛵
- Python中內置數據庫!SQLite使用指南!
- Pandas中你一定要掌握的時間序列相關高級功能
- 數據科學家賺多少?數據全分析與可視化 ⛵
- 交互式儀表板!Python輕鬆完成!⛵
- ChatGPT!我是你的破壁人;比爾·蓋茨不看好Web3與元宇宙;FIFA押中4屆世界盃冠軍;GitHub今日熱榜 | ShowMeAI資訊日報
- ChatGPT要收費了;華爾街大裁員;阿里2023十大科技趨勢;小紅書元宇宙虛擬服飾被吐槽;GitHub今日熱榜 | ShowMeAI資訊日報
- AI創業時代!這9個方向有錢途;AIGC再添霸榜應用Lensa;美團SemEval2022冠軍方法分享;醫學圖像處理工具箱… | ShowMeAI資訊日報
- 噓!P站數據分析年報;各省市疫情感染進度條;愛奇藝推出元宇宙App;You推出AI聊天機器人;GitHub今日熱榜 | ShowMeAI資訊日報
- 美國公司裁員潮時間線◉科技寒冬可視化;3份報告回顧中國開發者2022;自動駕駛下半場,誰會衝出重圍 | ShowMeAI每週通訊 #005-01.07
- 副業月入過萬?數據有話説;掃地機器人發展到哪步了;疫情後要不要重返辦公室;淘寶元宇宙直播間;GitHub今日熱榜 | ShowMeAI資訊日報
- 大戰谷歌!微軟Bing引入ChatGPT;羊了個羊40萬年薪招研發;Debian徹底移除Python2;GitHub今日熱榜 | ShowMeAI資訊日報
- 酸了!樂視工作制改為四天半;高通新年裁員;AI繪畫公司開始倒閉;網易入股張藝謀元宇宙公司;GitHub今日熱榜 | ShowMeAI資訊日報
- 要麼幹要麼滾!推特開始裁員了;深度學習產品應用·隨書代碼;可分離各種樂器音源的工具包;Transformer教程;前沿論文 | ShowMeAI資訊日報
- 真實世界的人工智能應用落地——OpenAI篇 ⛵
- 陽過→陽康,數據裏的時代側影;谷歌慌了!看各公司如何應對ChatGPT;兩份優質AI年報;本週技術高光時刻 | ShowMeAI每週通訊 #003-12.24
- 用魔法打敗魔法!這件毛衣讓攝像頭看不到你;兩款酷炫的AI寫作軟件;快如閃電的B站下載工具;基於擴散模型的蛋白質設計 | ShowMeAI資訊日報