2  Analysis of French PhD theses in Julia

Author

Samuel Ortion

Published

July 29, 2023

2.1 Load packages

using CSV
using DataFrames
using Dates
using Plots
using StatsPlots

2.2 English prevalence in French PhD theses since 1985

df = CSV.read("../tmp/english_prevalence.csv", DataFrame)
38×4 DataFrame
13 rows omitted
Row year english french missing
Int64 Int64 Int64 Int64
1 1985 1 1138 0
2 1986 5 3408 0
3 1987 4 4573 0
4 1988 4 6297 0
5 1989 2 6262 0
6 1990 7 6326 0
7 1991 1 6645 0
8 1992 7 7431 1
9 1993 10 7892 1
10 1994 15 8097 0
11 1995 14 6349 0
12 1996 16 6883 0
13 1997 20 7126 2
27 2011 1418 9042 0
28 2012 1890 9592 0
29 2013 2356 9560 0
30 2014 1877 7360 0
31 2015 1507 5777 0
32 2016 1422 5357 0
33 2017 1474 4929 0
34 2018 1371 3949 0
35 2019 1585 3398 0
36 2020 1526 2922 0
37 2021 1841 2940 0
38 2022 745 865 0
# Get the english / (french + english) ratio on a new column
df.ratio = df.english ./ (df.french + df.english)
38-element Vector{Float64}:
 0.000877963125548727
 0.001464986815118664
 0.0008739348918505571
 0.0006348198698619266
 0.00031928480204342275
 0.0011053213327017212
 0.0001504664459825459
 0.0009411132024737832
 0.0012655024044545685
 0.0018491124260355029
 0.0022002200220022
 0.0023191766922742428
 0.00279876854184159
 ⋮
 0.1355640535372849
 0.16460546943041282
 0.19771735481705272
 0.20320450362671863
 0.206891817682592
 0.20976545213158282
 0.23020459159768858
 0.2577067669172932
 0.3180814770218744
 0.3430755395683453
 0.3850658857979502
 0.46273291925465837

Plot:

plot(df.year, df.ratio, label="ratio", legend=:topleft, xlabel="Year", ylabel="English", title="English in French PhD theses since 1985", ylim=(0, 1))

Rate of French PhD theses in English since 1985

Save:

savefig("../media/plots/english_prevalence.png")
"/home/sortion/Documents/Projects/bio/bioinfo-fr/bioinfo-theses-analyses/media/plots/english_prevalence.png"