premium_pattern.Rmd
Suppose we have some monthly exposures that we would like to add premium data to.
key | duration | policy_month | start_int | end_int | exposure |
---|---|---|---|---|---|
B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 |
B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 |
B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 |
B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 |
B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 |
B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 |
Simulated premium data “trans” comes with the package.
key | trans_date | amt |
---|---|---|
B10251C8 | 2012-12-04 | 199 |
B10251C8 | 2013-12-28 | 197 |
B10251C8 | 2015-12-30 | 177 |
B10251C8 | 2019-05-07 | 192 |
B10251C8 | 2012-04-15 | 206 |
B10251C8 | 2019-04-02 | 220 |
The addStart function adds the start date of the appropriate exposure interval to the transactions.
start_int | key | trans_date | amt |
---|---|---|---|
2010-05-10 | B10251C8 | 2010-05-28 | 190 |
2010-06-10 | B10251C8 | 2010-07-04 | 189 |
2010-11-10 | B10251C8 | 2010-11-21 | 179 |
2011-04-10 | B10251C8 | 2011-05-08 | 210 |
2011-07-10 | B10251C8 | 2011-07-12 | 198 |
2012-01-10 | B10251C8 | 2012-01-14 | 194 |
We can group and aggregate by key and start_int to get unique transaction rows corresponding to intervals in exposures_PM.
trans_to_join <- trans_with_interval %>% group_by(start_int, key) %>% summarise(premium = sum(amt))
head(trans_to_join)
start_int | key | premium |
---|---|---|
2005-06-01 | D68554D5 | 97 |
2005-10-01 | D68554D5 | 169 |
2005-12-01 | D68554D5 | 96 |
2006-01-01 | D68554D5 | 193 |
2006-02-01 | D68554D5 | 107 |
2006-03-01 | D68554D5 | 119 |
Then we can join this to the exposures using a left join without duplicating any exposures.
premium_study <- exposures_PM %>% left_join(trans_to_join, by = c("key", "start_int"))
head(premium_study, 10)
key | duration | policy_month | start_int | end_int | exposure | premium |
---|---|---|---|---|---|---|
B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | NA |
B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 190 |
B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 189 |
B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | NA |
B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | NA |
B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | NA |
B10251C8 | 1 | 7 | 2010-10-10 | 2010-11-09 | 0.08487 | NA |
B10251C8 | 1 | 8 | 2010-11-10 | 2010-12-09 | 0.08214 | 179 |
B10251C8 | 1 | 9 | 2010-12-10 | 2011-01-09 | 0.08487 | NA |
B10251C8 | 1 | 10 | 2011-01-10 | 2011-02-09 | 0.08487 | NA |
Change the NA values resulting from the join to zeros using an if_else.
premium_study <- premium_study %>% mutate(premium = if_else(is.na(premium), 0, premium))
head(premium_study, 10)
key | duration | policy_month | start_int | end_int | exposure | premium |
---|---|---|---|---|---|---|
B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | 0 |
B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 190 |
B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 189 |
B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | 0 |
B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | 0 |
B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | 0 |
B10251C8 | 1 | 7 | 2010-10-10 | 2010-11-09 | 0.08487 | 0 |
B10251C8 | 1 | 8 | 2010-11-10 | 2010-12-09 | 0.08214 | 179 |
B10251C8 | 1 | 9 | 2010-12-10 | 2011-01-09 | 0.08487 | 0 |
B10251C8 | 1 | 10 | 2011-01-10 | 2011-02-09 | 0.08487 | 0 |
Now we are free to do any calculations we want. For a simple example we calculate the average premium in the first two policy months. Refer to the section on adding additional information for more creative policy splits.
premium_study %>% filter(policy_month %in% c(1,2)) %>% group_by(policy_month) %>% summarise(avg_premium = mean(premium))
policy_month | avg_premium |
---|---|
1 | 60.46 |
2 | 66.88 |
###Other Uses for addStart Suppose we were interested in what the last premium paid by a policy was for some predictive analytics project. Again we left join the premium to the exposure frame.
previous_premium_unfilled <- exposures_PM %>% left_join(trans_to_join, by = c("key", "start_int"))
head(previous_premium_unfilled)
key | duration | policy_month | start_int | end_int | exposure | premium |
---|---|---|---|---|---|---|
B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | NA |
B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 190 |
B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 189 |
B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | NA |
B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | NA |
B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | NA |
This time we fill in NA values with the previous paid premium instead of 0. The first interval is NA because there are no prior premiums.
key | duration | policy_month | start_int | end_int | exposure | premium |
---|---|---|---|---|---|---|
B10251C8 | 1 | 1 | 2010-04-10 | 2010-05-09 | 0.08214 | NA |
B10251C8 | 1 | 2 | 2010-05-10 | 2010-06-09 | 0.08487 | 190 |
B10251C8 | 1 | 3 | 2010-06-10 | 2010-07-09 | 0.08214 | 189 |
B10251C8 | 1 | 4 | 2010-07-10 | 2010-08-09 | 0.08487 | 189 |
B10251C8 | 1 | 5 | 2010-08-10 | 2010-09-09 | 0.08487 | 189 |
B10251C8 | 1 | 6 | 2010-09-10 | 2010-10-09 | 0.08214 | 189 |
Data manipulations similar to this can be used to engineer features for anything varying with time: account values, guarantees, planned premiums, etc…