Suppose we have some monthly exposures that we would like to add premium data to.

exposures_PM <- addExposures(records, type = "PM")
head(exposures_PM)
key duration policy_month start_int end_int exposure
B10251C8 1 1 2010-04-10 2010-05-09 0.08214
B10251C8 1 2 2010-05-10 2010-06-09 0.08487
B10251C8 1 3 2010-06-10 2010-07-09 0.08214
B10251C8 1 4 2010-07-10 2010-08-09 0.08487
B10251C8 1 5 2010-08-10 2010-09-09 0.08487
B10251C8 1 6 2010-09-10 2010-10-09 0.08214

Simulated premium data “trans” comes with the package.

head(trans)
key trans_date amt
B10251C8 2012-12-04 199
B10251C8 2013-12-28 197
B10251C8 2015-12-30 177
B10251C8 2019-05-07 192
B10251C8 2012-04-15 206
B10251C8 2019-04-02 220

The addStart function adds the start date of the appropriate exposure interval to the transactions.

trans_with_interval <- addStart(trans, exposures_PM)
head(trans_with_interval)
start_int key trans_date amt
2010-05-10 B10251C8 2010-05-28 190
2010-06-10 B10251C8 2010-07-04 189
2010-11-10 B10251C8 2010-11-21 179
2011-04-10 B10251C8 2011-05-08 210
2011-07-10 B10251C8 2011-07-12 198
2012-01-10 B10251C8 2012-01-14 194

We can group and aggregate by key and start_int to get unique transaction rows corresponding to intervals in exposures_PM.

start_int key premium
2005-06-01 D68554D5 97
2005-10-01 D68554D5 169
2005-12-01 D68554D5 96
2006-01-01 D68554D5 193
2006-02-01 D68554D5 107
2006-03-01 D68554D5 119

Then we can join this to the exposures using a left join without duplicating any exposures.

key duration policy_month start_int end_int exposure premium
B10251C8 1 1 2010-04-10 2010-05-09 0.08214 NA
B10251C8 1 2 2010-05-10 2010-06-09 0.08487 190
B10251C8 1 3 2010-06-10 2010-07-09 0.08214 189
B10251C8 1 4 2010-07-10 2010-08-09 0.08487 NA
B10251C8 1 5 2010-08-10 2010-09-09 0.08487 NA
B10251C8 1 6 2010-09-10 2010-10-09 0.08214 NA
B10251C8 1 7 2010-10-10 2010-11-09 0.08487 NA
B10251C8 1 8 2010-11-10 2010-12-09 0.08214 179
B10251C8 1 9 2010-12-10 2011-01-09 0.08487 NA
B10251C8 1 10 2011-01-10 2011-02-09 0.08487 NA

Change the NA values resulting from the join to zeros using an if_else.

key duration policy_month start_int end_int exposure premium
B10251C8 1 1 2010-04-10 2010-05-09 0.08214 0
B10251C8 1 2 2010-05-10 2010-06-09 0.08487 190
B10251C8 1 3 2010-06-10 2010-07-09 0.08214 189
B10251C8 1 4 2010-07-10 2010-08-09 0.08487 0
B10251C8 1 5 2010-08-10 2010-09-09 0.08487 0
B10251C8 1 6 2010-09-10 2010-10-09 0.08214 0
B10251C8 1 7 2010-10-10 2010-11-09 0.08487 0
B10251C8 1 8 2010-11-10 2010-12-09 0.08214 179
B10251C8 1 9 2010-12-10 2011-01-09 0.08487 0
B10251C8 1 10 2011-01-10 2011-02-09 0.08487 0

Now we are free to do any calculations we want. For a simple example we calculate the average premium in the first two policy months. Refer to the section on adding additional information for more creative policy splits.

premium_study %>% filter(policy_month %in% c(1,2)) %>% group_by(policy_month) %>% summarise(avg_premium = mean(premium))
policy_month avg_premium
1 60.46
2 66.88

###Other Uses for addStart Suppose we were interested in what the last premium paid by a policy was for some predictive analytics project. Again we left join the premium to the exposure frame.

key duration policy_month start_int end_int exposure premium
B10251C8 1 1 2010-04-10 2010-05-09 0.08214 NA
B10251C8 1 2 2010-05-10 2010-06-09 0.08487 190
B10251C8 1 3 2010-06-10 2010-07-09 0.08214 189
B10251C8 1 4 2010-07-10 2010-08-09 0.08487 NA
B10251C8 1 5 2010-08-10 2010-09-09 0.08487 NA
B10251C8 1 6 2010-09-10 2010-10-09 0.08214 NA

This time we fill in NA values with the previous paid premium instead of 0. The first interval is NA because there are no prior premiums.

key duration policy_month start_int end_int exposure premium
B10251C8 1 1 2010-04-10 2010-05-09 0.08214 NA
B10251C8 1 2 2010-05-10 2010-06-09 0.08487 190
B10251C8 1 3 2010-06-10 2010-07-09 0.08214 189
B10251C8 1 4 2010-07-10 2010-08-09 0.08487 189
B10251C8 1 5 2010-08-10 2010-09-09 0.08487 189
B10251C8 1 6 2010-09-10 2010-10-09 0.08214 189

Data manipulations similar to this can be used to engineer features for anything varying with time: account values, guarantees, planned premiums, etc…