Category;Name;Synthetic;Several datasets;Link to dataset;Links to articles;Access;Number of rows;Number of columns;Datasize [MB];Targets;Quality (1 poor, 5 excellent);Comments;Contact; P&C Pricing;French Motor Third-Party Liability Claims;No;No;https://www.openml.org/d/41214;https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3164764, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3226852, https://scikit-learn.org/dev/auto_examples/linear_model/plot_tweedie_regression_insurance_claims.html#sphx-glr-auto-examples-linear-model-plot-tweedie-regression-insurance-claims-py, https://scikit-learn.org/stable/auto_examples/linear_model/plot_poisson_regression_non_normal_loss.html, https://github.com/fpechon/SummerSchool;Free;678013;12;35;Number of claims (ClaimNb, integer);4;Several Scikit-learn Tutorials on this ;philipp.probst@actuarialdatascience.org; Vehicle data;TLC Trip Record Data;No;Yes;https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page;https://www.kaggle.com/datasets/elemento/nyc-yellow-taxi-trip-data;Free;2964624;19;48 (per monthly data);-;5;Relevant for insurance?;philipp.probst@actuarialdatascience.org; P&C Pricing Motor;Autoseg - Sistema de Estatísticas de Automóveis da SUSEP;No;Yes;https://www2.susep.gov.br/menuestatistica/Autoseg/principal.aspx;https://ratemake.com, https://github.com/kasaai/explain-ml-pricing;Free;31940157;46;1200;?;3;"Download takes a long time; variables are difficult to understand, especially without knowledge of Brazil insurance";philipp.probst@actuarialdatascience.org; Vehicle data;Victoria Road Crash Data;No;Yes;https://discover.data.vic.gov.au/dataset/victoria-road-crash-data;https://www.monash.edu/__data/assets/pdf_file/0004/216418/muarc060.pdf;Free;169877;23;28;-;5;Several more datasets can be joined;philipp.probst@actuarialdatascience.org; P&C Pricing Motor;Driver Telematics;Yes;No;https://www2.math.uconn.edu/~valdez/data.html;https://www.mdpi.com/2227-9091/9/4/58;Free;100000;52;36;PCOC;4;Created from a similar real dataset;philipp.probst@actuarialdatascience.org; P&C Pricing Motor;Dataset of an actual motor vehicle insurance portfolio;No;No;https://data.mendeley.com/datasets/5cxyb5fp4f/1;https://scholar.google.es/scholar?oi=bibs&hl=es&cites=16214827629935489049;Free;105555;30;14;PCOC;5;;philipp.probst@actuarialdatascience.org; L&H Health Surveys;IPUMS NHIS;No;Yes;https://nhis.ipums.org/nhis/;https://nhis.ipums.org/nhis/articles_and_presentations.shtml;Free;10+ million;500+;1000+;Mainly mortality;4;Data collected since 1963, several structural changes over time;daniel.meier@actuarialdatascience.org; L&H Lapse;SOA Lapse Study;No;No;https://github.com/kevinykuo/insurance/tree/master/data;https://www.soa.org/resources/experience-studies/2014/research-2014-post-level-shock/, https://cellar.kasa.ai/dataset/lapse_study/;Free;345627;15;3;Lapse (rates);4;Dataset should be joined with macroeconomic timeseries, since these affect lapse rates;daniel.meier@actuarialdatascience.org; L&H Mortality;Human Mortality Database (HMD);No;Yes;https://mortality.org/;https://mortality.org/File/GetDocument/Public/HMD-Publist.pdf, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3441030, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3656210;Free;Depending on country;10;Depending on country;Mortality;5;The goto place to compare mortality rates and their development over time in many countries;daniel.meier@actuarialdatascience.org; L&H Mortality;Weekly mortality data (STMF);No;Yes;https://mortality.org/Data/STMF;https://www.nature.com/articles/s41597-021-01019-1;Free;Depending on country;19;Depending on country;Mortality by calendar week;5;Triggered by COVID-19, weekly mortality data from several countries is provided;daniel.meier@actuarialdatascience.org; L&H Mortality;Causes of death;No;Yes;https://mortality.org/Data/HCD;https://mortality.org/File/GetDocument/hcd/docs/HCD_Method_Explanatory_Notes.pdf;Free;Depending on country;32;Depending on country;Mortality by cause of death;5;Causes of death timeseries at various granularities and for several countries;daniel.meier@actuarialdatascience.org; L&H Mortality;SOA Disability Study;No;No;https://www.soa.org/49382f/globalassets/assets/files/research/projects/2017-gltd-recovery-mortality-tree-data.zip;https://www.soa.org/49382f/globalassets/assets/files/research/projects/2017-gltd-recovery-mortality-tree.pdf;Free;509532;11;58;Mortality;4;Mortality experience of about 25 companies from Long Term Disability insurance;daniel.meier@actuarialdatascience.org; L&H Disability Recovery;SOA Disability Study;No;No;https://www.soa.org/49382f/globalassets/assets/files/research/projects/2017-gltd-recovery-mortality-tree-data.zip;https://www.soa.org/49382f/globalassets/assets/files/research/projects/2017-gltd-recovery-mortality-tree.pdf, https://github.com/DeutscheAktuarvereinigung/Data_Science_Challenge_2020_Berufsunfaehigkeit;Free;818942;14;128;Recovery (rates);4;Recovery experience of about 25 companies from Long Term Disability insurance;daniel.meier@actuarialdatascience.org; L&H Medical Expenses;MEPS;No;Yes;https://meps.ahrq.gov//mepsweb/;https://meps.ahrq.gov/mepsweb/about_meps/survey_back.jsp;Free;Depending on dataset;Depending on dataset;Depending on dataset;Mainly medical expenses;4;Survey data starting in 1996;daniel.meier@actuarialdatascience.org; L&H Health Surveys;NHANES;No;No;https://cran.r-project.org/web/packages/NHANES/index.html;https://cran.r-project.org/web/packages/NHANES/NHANES.pdf;Free;10000;76;45354;Health status, diabetes;4;BMI, socio economics, smoking behavior, etc.;daniel.meier@actuarialdatascience.org; L&H Clinical Health Data;MIMIC;No;Yes;https://physionet.org/static/published-projects/mimiciii-demo/mimic-iii-clinical-database-demo-1.4.zip;https://physionet.org/content/mimiciii-demo/1.4/;;Depending on dataset;Depending on dataset;98 in total;Health diagnoses;4;40k patients of the Beth Israel Deaconess Medical Center;daniel.meier@actuarialdatascience.org; L&H Health Data;SAMHDA;No;Yes;https://www.datafiles.samhsa.gov/data-sources;https://www.datafiles.samhsa.gov/sites/default/files/field-uploads-protected/studies/MH-CLD-2022/MH-CLD-2022-DS0001/MH-CLD-2022-DS0001-info/MH-CLD-2022-DS0001-info-codebook.pdf;Free;Depending on dataset;Depending on dataset;Depending on dataset;Health diagnoses;4;Collection of datasets for substance use and mental health;daniel.meier@actuarialdatascience.org; L&H Medical Literature;Pubmed;No;Yes;https://datadiscovery.nlm.nih.gov/Literature/PubMed-Central-Open-Access-Subset-PMC-OA-/qbit-hxvw/about_data;https://arxiv.org/pdf/2304.14454, https://www.nature.com/articles/s41597-023-02134-x;Free;Non-tabular;Non-tabular;Several million articles;Structure, text, clusters, etc.;4;Text mining, NLP, LLMs;daniel.meier@actuarialdatascience.org; L&H Health and Retirement Study;HRS;No;Yes;https://hrs.isr.umich.edu/;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5798643/;Free;Depending on dataset;Depending on dataset;Depending on dataset;Mortality, morbidity;4;Survey data of pensioners;daniel.meier@actuarialdatascience.org; P&C Pricing;Fire incidents Toronto;No;No;https://open.toronto.ca/dataset/fire-incidents/;https://www.cambridge.org/core/journals/astin-bulletin-journal-of-the-iaa/article/abs/geographic-ratemaking-with-spatial-embeddings/FE5AF1B2DD96B0D6B2684A775A847013;Free;32939;43;24;Losses, casualties;5;Fire incidents in Toronto;daniel.meier@actuarialdatascience.org; P&C Pricing;National Motor Vehicle Crash Causation Study;No;Yes;https://www.nhtsa.gov/file-downloads?p=nhtsa/downloads/NASS/NMVCCS/;https://www.casact.org/sites/default/files/presentation/affiliates_sccac_0515_borba.pdf;Free;Depending on dataset;Depending on dataset;Depending on dataset;Accidents;4;Only available in SAS data format;daniel.meier@actuarialdatascience.org; P&C Pricing;Accidents Switzerland;No;No;https://data.geo.admin.ch/ch.astra.unfaelle-personenschaeden_alle/unfaelle-personenschaeden_alle/unfaelle-personenschaeden_alle_2056.csv.zip;https://opendata.swiss/de/dataset/strassenverkehrsunfallorte;Free;232303;36;127;Accidents;5;Accident data in Switzerland since 2011;daniel.meier@actuarialdatascience.org; P&C Pricing;Accidents UK;No;Yes;https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data;;Free;Depending on dataset;Depending on dataset;Depending on dataset;Accidents;5;Accident data in the UK;daniel.meier@actuarialdatascience.org; P&C Pricing;CASDatasets;Both;Yes;http://cas.uqam.ca/;http://cas.uqam.ca/pub/web/CASdatasets-manual.pdf;Free;Depending on dataset;Depending on dataset;Depending on dataset;Many different topics;4;large variety of actuarial datasets;philipp.probst@actuarialdatascience.org; P&C Pricing;insuranceData;Both;Yes;https://cran.r-project.org/web/packages/insuranceData/insuranceData.pdf;;Free;Depending on dataset;Depending on dataset;Depending on dataset;Many different topics;4;large variety of actuarial datasets;philipp.probst@actuarialdatascience.org; P&C Pricing;insurancerating;No;Yes;https://cran.r-project.org/web/packages/insurancerating/insurancerating.pdf;;Free;Depending on dataset;Depending on dataset;Depending on dataset;Motor Third Part Liability;4;two datasets for MTPL;philipp.probst@actuarialdatascience.org; P&C Pricing;raw;No;Yes;https://cran.r-project.org/web/packages/raw/raw.pdf;;Free;Depending on dataset;Depending on dataset;Depending on dataset;Many different topics;4;large variety of actuarial datasets;philipp.probst@actuarialdatascience.org; P&C Pricing;Porto Seguro’s Safe Driver Prediction;No;No;https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/data;https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/overview;Free Registration;595212 (training_data);119;300;Claim occurence;4;Kaggle competition;philipp.probst@actuarialdatascience.org; P&C Pricing;Allstate Claim Prediction Challenge;No;No;https://www.kaggle.com/c/ClaimPredictionChallenge/data;https://www.kaggle.com/c/ClaimPredictionChallenge/overview;Free Registration;13184290;35;700;Claims;4;Kaggle competition;philipp.probst@actuarialdatascience.org; P&C Pricing;Belgium Motor third party liability pricing data:;No;No;https://github.com/katrienantonio/hands-on-machine-learning-R-module-3/blob/main/data/PC_data.txt;https://katrienantonio.github.io/hands-on-machine-learning-R-module-3/sheets/ML_part3.html#85;Free Registration;163231;18;16;Claims;4;Tutorial;philipp.probst@actuarialdatascience.org; P&C Pricing;Motor Insurance Market Simulation;No;No;https://www.aicrowd.com/challenges/insurance-pricing-game?utm_source=p&utm_medium=c&utm_campaign=i#dataset;https://www.aicrowd.com/challenges/insurance-pricing-game?utm_source=p&utm_medium=c&utm_campaign=i;Free Registration;240000;?;?;Claims, Market Competition;4;AIcrowd competition;philipp.probst@actuarialdatascience.org; P&C Pricing;Actuarial loss prediction;No;No;https://www.kaggle.com/c/actuarial-loss-estimation/overview;https://www.kaggle.com/c/actuarial-loss-estimation/;Free Registration;90000;31;13;Claims, NLP;4;Kaggle competition;philipp.probst@actuarialdatascience.org; P&C Reserving;ChainLadder;Both;Yes;https://cran.r-project.org/web/packages/ChainLadder/ChainLadder.pdf;https://cran.r-project.org/web/packages/ChainLadder/ChainLadder.pdf;Free;Depending on dataset;Depending on dataset;Depending on dataset;Many different topics;4;large variety of triangle datasets;philipp.probst@actuarialdatascience.org; P&C Reserving;Individual Claims Generator: Monthly Cash Flows;Yes;Yes;https://people.math.ethz.ch/~wmario/simulation.html;https://github.com/actuarial-data-science/PackageIndividualClaimsSimulator;Free;flexible;flexible;flexible;Claims development;4;Simulation tool;philipp.probst@actuarialdatascience.org; P&C Reserving;SynthETIC;Yes;No;https://cran.r-project.org/web/packages/SynthETIC/SynthETIC.pdf;https://cran.r-project.org/web/packages/SynthETIC/SynthETIC.pdf;Free;flexible;flexible;flexible;Claims and their development;4;Simulation tool;philipp.probst@actuarialdatascience.org; P&C Reserving;DeepTriangle;Yes;No;https://github.com/kevinykuo/deeptriangle/;https://arxiv.org/pdf/1804.09253;Free;flexible;flexible;flexible;Claims development;4;Simulation tool;philipp.probst@actuarialdatascience.org; P&C Lapse;eudirectlapse;No;No;http://cas.uqam.ca/pub/web/CASdatasets-manual.pdf;http://cas.uqam.ca/pub/web/CASdatasets-manual.pdf;Free;23060;19;CASDatasets package;Lapse/Renewal;4;Lapse (renewal) data from unknown insurer;philipp.probst@actuarialdatascience.org; P&C Pricing;Car Insurance Claim Prediction;No;No;https://www.kaggle.com/datasets/ifteshanajnin/carinsuranceclaimprediction-classification;https://www.kaggle.com/datasets/ifteshanajnin/carinsuranceclaimprediction-classification;Free Registration;58592;89;23;Claim occurence;4;Kaggle competition;philipp.probst@actuarialdatascience.org; P&C Pricing;The Insurance Company (TIC) Benchmark;No;No;https://www.rdocumentation.org/packages/ISLR/versions/1.4/topics/Caravan;https://liacs.leidenuniv.nl/~puttenpwhvander/library/cc2000/data.html;Free;5822;86;ISLR package;Policy holding;4;Prediction of number of mobile home policies;philipp.probst@actuarialdatascience.org; P&C Pricing;BNP Paribas Cardif Claims Management;No;No;https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/overview;https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/overview;Free Registration;114321;133;104;Claim category;4;Kaggle competition;philipp.probst@actuarialdatascience.org; P&C Pricing;Auto Insurance Claims Data;No;No;https://www.kaggle.com/datasets/buntyshah/auto-insurance-claims-data/data;https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4954928053318020/1058911316420443/167703932442645/latest.html;Free Registration;1000;40;1;Fraud Detection;4;Kaggle Dataset;philipp.probst@actuarialdatascience.org; Finance;Credit Card Fraud Detection;No;No;https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud;https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud;Free Registration;284000;31;69;Fraud Detection;4;Kaggle competition;philipp.probst@actuarialdatascience.org; P&C Pricing;Vehicle Insurance Claim Fraud Detection;No;No;https://www.kaggle.com/datasets/shivamb/vehicle-claim-fraud-detection;https://www.kaggle.com/datasets/shivamb/vehicle-claim-fraud-detection;Free Registration;15400;33;4;Fraud Detection;4;Kaggle competition;philipp.probst@actuarialdatascience.org; P&C Pricing;Actuarial Applications of Natural Language Processing Using Transformers;No;No;https://github.com/JSchelldorfer/ActuarialDataScience/blob/master/12%20-%20NLP%20Using%20Transformers/NHTSA_NMVCCS_extract.parquet.gzip;https://arxiv.org/abs/2206.02014, https://github.com/actuarial-data-science/Tutorials/tree/master/12%20-%20NLP%20Using%20Transformers;Free;7000;16;8;NLP;4;Using text features in actuarial applications;philipp.probst@actuarialdatascience.org; L&H Health Data;Actuarial loss prediction;Yes;No;https://www.openml.org/search?type=data&sort=runs&id=42876&status=active;https://www.openml.org/search?type=data&sort=runs&id=42876&status=active;Free;100000;14;12;NLP, Claims;4;OpenML Dataset, similar but different to the one from Kaggle;philipp.probst@actuarialdatascience.org; Telematics;Insurance Data Science: Use and Value of Unusual Data;Both;Yes;https://egallic.fr/lausanne/;https://egallic.fr/lausanne/;Free;Depending on dataset;Depending on dataset;Depending on dataset;Telematics, Other;4;Tutorial with several datasets;philipp.probst@actuarialdatascience.org; Telematics;Synthetic Dataset Generation of Driver Telematics;Yes;No;http://www2.math.uconn.edu/%7Evaldez/telematics_syn-032021.csv;https://insurancedatascience.org/downloads/London2021/Session_1a/Emiliano_Valdez.pdf;Free;100000;52;35;Telematics;4;Synthetic telematics dataset;philipp.probst@actuarialdatascience.org; L&H Health Data;US Health Insurance Dataset;No;No;https://www.kaggle.com/datasets/teertha/ushealthinsurancedataset;https://www.kaggle.com/datasets/teertha/ushealthinsurancedataset;Free Registration;1338;7;1;Medical costs;4;Simple dataset that can be used for playing;philipp.probst@actuarialdatascience.org; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;; ;;;;;;;;;;;;;;