class: center, middle, inverse, title-slide # Deep learning applications: policyholder behavior modeling and beyond ##
kevinykuo.com/talk/2018/10/soa-annual
#SOAannual ### Bob Crompton, Frankie Logan, Kevin Kuo ### October 2018 --- # Introduction Your hosts and their drinks of choice (at the moment). .center[] .center[.content-box-gray[Can you spot the actuary?]] --- class: center, middle, inverse # The actual content... --- # Deep Learning - What is it? .Large[.Center[A statistical technique for classifying patterns, based on sample data, using neural networks with multiple layers]] --- class: top, left # Obligatory Deep Learning Illustration <!-- --> --- class: top, left # Historical Perspectives .Large[ * Neural networks have been around for decades (Perceptron algorithm dates to 1957) * Increase in computer power & improvement in Big Data algorithms have made multiple-layer neural networks feasible * 2012 was a break-through year for Deep Learning due to success on the Imagenet Challenge ] --- class: top, left # Deep Learning - Strengths .Large[ * Can represent an finite mapping between input & output provided there is <span class = "red bold"> infinite data </span> * In practice, large data sets often provide good results * How large is "large?" * Krizhevshy et al's Imagenet model had the following attributes * 1 million distinct examples * 9 layers * 650,000 nodes * 60 million parameters ] --- class: top, left # Deep Learning - Strengths (Continued...) .Large[ * Deep learning does well at these tasks: * Object recognition * Speech recognition * Other non-linear "natural" data * Game play * Clever mapping of outputs to inputs increases applicability of this approach ] --- class: top, left # Deep Learning - Limitations .Large[ * Data hungry * Infinite data not available * Limited capacity for analogies * Generalization possible only when differences between training data and test data are small * Lack of transparency * Lack of hierarchical recognition ] --- class: top, left # Deep Learning - Current Insurance Applications .Large[ * Data on General AI & Machine Learning in Insurance is extensive * Data on Deep Learning in particular is not widespread, but there do seem to be a few uses * Property claims * Claims Fraud * Investment analysis ] --- class: top, left # Deep Learning - Potential Applications .Large[ * M&A Analysis * Natural language processing & report writing ] --- class: center, middle, inverse # The real reason we care about deep learning... --- class: center, middle, inverse <!-- --> --- class: center, middle, inverse # Slightly more technical... --- # Learning <img src="img/deep-learning-in-3-figures-1.png" width="80%" /> --- # Learning <img src="img/deep-learning-in-3-figures-2.png" width="60%" /> --- # Learning <img src="img/deep-learning-in-3-figures-3.png" width="60%" /> --- class: center, middle, inverse # Case study time! --- class: top, left # Term Life Insurance .large[ - Insurance companies provide coverages for a set number of years or "term" (e.g. 15 years) with a fixed premium rate. After the term is over, the insured has a choice of lapsing or continuing the coverage with a different (usually higher) premium rate. - Shock Lapse - the high lapse rate phenomenon after post level premium term * Mortality implication - the policyholders that continue the coverage usually "need" it. * Profitability - in addition to higher mortality rate, insurers are receiving less premium. ] --- class: inverse, middle, center #To improve risk management & better retain policyholders, we need to understand which group of policyholders will most likely to experience shock lapse. --- class: top, left # Current Study .Large[ - SOA has conducted studies to predict lapse rate using regression models (e.g. generalized linear model aka "GLM") - A few takeaways: - Higher face amount -> higher lapse rate - Higher Premium Jump % -> higher lapse rate - Older issue age -> lapse rate increases at a decreasing rate </br></br> SOA 2014 Post Level Term Lapse & Mortality Report: https://www.soa.org/experience-studies/2014/research-2014-post-level-shock/ ] --- class: top, left # Data .Large[ * Companies that participated in the SOA study provided a list of inforce and terminated level term policy with the exact issue dates and dates of termination respectively. * Each cell, or row, represents a policy block. * A grace period adjustment was made to moved shock lapses that were reported 30 to 100 days into the first duration of the post-level period back to the final duration of the level period. ] --- class: top, left # Data (Cont.) Let's take a look at our data ``` ## Observations: 345,627 ## Variables: 15 ## $ lapse_study_year <chr> "2000-2001", "2008-2009", "2000-2... ## $ duration <chr> "6-9", "10", "6-9", "6-9", "6-9",... ## $ gender <chr> "M", "F", "F", "M", "F", "F", "M"... ## $ issue_year <chr> "1994-1996", "1997-1999", "1994-1... ## $ issue_age <chr> "50-59", "40-49", "40-49", "40-49... ## $ face_amount <chr> "A. < 100k", "B. 100k-249k", "C... ## $ post_level_premium_structure <chr> "1. Premium Jump to ART", "1. Pre... ## $ premium_jump_ratio <chr> "Y. Unknown", "Y. Unknown", "Y. U... ## $ risk_class <chr> "Agg/Unknown", "Super-Pref NS", "... ## $ premium_mode <chr> "2. Semiannual", "4. Monthly", "3... ## $ exposure_amount <dbl> 325000, 1232000, 2600000, 1550000... ## $ exposure_count <dbl> 7.0000, 11.0000, 7.0000, 4.0000, ... ## $ lapse_amount <dbl> 25000, 500000, 0, 1250000, 0, 377... ## $ lapse_count <dbl> 1, 5, 0, 3, 0, 11, 6, 3, 0, 0, 0,... ## $ policy_year <int> 2000, 2008, 2000, 2008, 2003, 201... ``` Source: [SOA 2014 Post Level Term Lapse & Mortality Report](https://www.soa.org/experience-studies/2014/research-2014-post-level-shock/) --- class: top, left #Predictors and Response Predictors: ``` ## [1] "Gender" "Issue Age" ## [3] "Face Amount" "Post Level Premium Structure" ## [5] "Premium Jump Ratio" "Risk Class" ## [7] "Premium Mode" ``` Responses: ``` ## [1] "Lapse Count Rate" ``` --- class: top, left # Data Prep 1. Filter records for the 10<sup>th</sup> policy duration 2. Filter records for positive exposure count and amount 3. Create lapse count rate and lapse amount rate variables -- Training set: policy year < 2010 Validation set: policy year = 2010 Testing set: policy year > 2010 -- </br> - In an **ideal world**, the lag between training and validation should be much **longer**. Our goal is to predict lapse rate at policy inception. -- </br></br></br> No more feature engineering! Let's get to modeling. --- class: top, left # Models .Large[ - Historical Average - baseline - Generalized Linear Model (GLM) - technique from SOA study - Gradient Boosting Machine (GBM) - Neural Net ] -- .Large[ Evaluation Metric: - RMSE (Root Mean Squared Error) - R<sup>2</sup> (R-Squared) - MAE (Mean Absolute Error) ] --- class:top, left # Baseline, Regression & Machine Learning Models </br></br> |Metric | Historical Average | GLM | GBM | |-------------| :--------------------:| -----:| -----:| |RMSE |0.298 |0.273 |0.272 | |R<sup>2</sup>|0.283 |0.333 |0.339 | |MAE |0.184 |0.197 |0.191 | </br></br><big>As you can see, the model tends to get better as we progress from simple averages to machine learning models!</big> --- class: top, left # Neural Net .Large[ * Since neural net is a matherical model, all inputs must be a number. Before we can build our model, we have to convert our categorical predictors to numerical predictors. * Ways to convert categorical variable to numeric variable: * Assign nominal or ordinal number * <span class = "red">One-hot encoding</span> * <span class = "red">Embedding</span> ] --- class: top, left # One-Hot Encoding One-Hot Encoding - Fancy phrase for _dummy variable_? -- .pull-left[ ``` ## gender ## 1 Male ## 2 Female ## 3 Female ## 4 Male ``` <img src="img/arrow.JPG" height="50" /> ``` ## Male Female ## 1 1 0 ## 2 0 1 ## 3 0 1 ## 4 1 0 ``` ] -- .pull-right[ * More values in the predictor -> higher dimenisons. * One-hot encoding has a few disadvantages: * it doesn't look at relationships between the different values of the predictor * it is computationally expensive ] --- class: top, left #Embedding Layers * In embedding, we map each of the categorical variable to an embedding space. Within this space, values with similar output are mapped close together. -- Example: </br> ``` Risk Class [1] "Super-Pref NS" "Agg/Unknown" "Non-Pref NS" "Pref SM" "Pref Best NS" "Undiff SM" [7] "Pref Resid NS" "Undiff NS" "Non-Pref SM" ``` </br> * Instead of creating an additional variable for each of the levels in risk class, we can map it into a multi-dimensional embedding space. -- * Embedding allow us to: * learn about the intrinsic properties of variable. * reduces resource cost --- class: top, left #Embedding Layers - Risk Class .center[ <!-- --> <small>("Super-Pref NS" is more closely related to "Pref Resid NS" than "Pref SM")</small> ] --- class: top, left # Neural Net Structure .center[ <img src="img/nn_structure.jpeg" width="1200" height="500" /> ] --- class: top, left # Neural Net Result .center[ <!-- --> ] --- class: top, left # Model Comparison </br></br></br> |Metric | Historical Average | GLM | GBM | <span class = "red">Neural Net</span>| |------------ |:---------------------:| -----:| -----:| ------------------------------------:| |RMSE |0.298 |0.273 |0.272 |<span class = "red">0.266</span> | |R<sup>2</sup>|0.283 |0.333 |0.339 |<span class = "red">0.366</span> | |MAE |0.184 |0.197 |0.191 |<span class = "red">0.182</span> | --- class: inverse, center, middle # Interpreting the Models --- # Visualizing the premium jump embedding <img src="img/embedding_viz.png" width="90%" /> --- # Comparing model performance <img src="img/distribution_residual.png" width="80%" /> --- # Comparing model performance <img src="img/performance_boxplot.png" width="90%" /> --- # Variable importance <img src="img/vi_nn.png" width="90%" /> --- # Variable importance - comparison <img src="img/variable_importance.png" width="90%" /> --- # Prediction breakdown <img src="img/pb_nn.png" width="90%" /> --- # Prediction breakdown - comparison <img src="img/prediction_breakdown.png" width="80%" /> --- class: center, inverse, middle # Couple quick words on software... --- class: top, left # Tools - [R](https://www.r-project.org/) * Data Prep - [`tidyverse`](https://www.tidyverse.org/), [`recipes`](https://github.com/tidymodels/recipes) * Modeling - [`keras`](https://keras.rstudio.com/), [`xgboost`](https://xgboost.readthedocs.io/en/latest/) * Model Evaluation/Visualization - [`yardstick`](https://github.com/tidymodels/yardstick), [`DALEX`](https://github.com/pbiecek/DALEX/tree/master/R), [`deepviz`](https://github.com/andrie/deepviz) --- # Deep learning libraries .large[ Couple ways to go (and really the only two ways) - PyTorch ([https://pytorch.org/](https://pytorch.org/)) - Supported by Facebook. - TensorFlow ([https://www.tensorflow.org/](https://www.tensorflow.org/)) + Keras ([https://keras.io/](https://keras.io/)) - Supported by Google. R interface supported by RStudio. ] --- # Q & A .large[ - Link to slides/code: [https://kevinykuo.com/talk/2018/10/soa-annual](https://kevinykuo.com/talk/2018/10/soa-annual) - GitHub repo: [github.com/kevinykuo/shocklapsedemo](https://github.com/kevinykuo/shocklapsedemo) - If you care about P&C loss reserving (GI Trac~~t~~k anyone?), check out [DeepTriangle: A Deep Learning Approach to Loss Reserving](https://arxiv.org/abs/1804.09253) ]