library(tidyverse)

## ── Attaching packages ─────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.7
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0

## ── Conflicts ────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(bayesm)
data(tuna)
#Train Test Split
set.seed(3)
Data.Indexes <-sample(1:nrow(tuna), size=.632*nrow(tuna))
Train<-tuna[Data.Indexes,]
Test<-tuna[-Data.Indexes,]

Business Case: I’m choosing to look at price elasticities of sales due to the limitations of the data only having 7 SKUs thus giving us limited information about market share

Star Kist 6 oz.

starkist<-lm(log(MOVE1)~LPRICE1+LPRICE2+LPRICE3+LPRICE4+LPRICE5+LPRICE6+LPRICE7, data = Train)
summary(starkist)

## 
## Call:
## lm(formula = log(MOVE1) ~ LPRICE1 + LPRICE2 + LPRICE3 + LPRICE4 + 
##     LPRICE5 + LPRICE6 + LPRICE7, data = Train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9068 -0.2424 -0.0804  0.1732  3.2260 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.1965     1.5243   5.377 2.05e-07 ***
## LPRICE1      -4.4664     0.2448 -18.245  < 2e-16 ***
## LPRICE2       0.8476     0.3238   2.618  0.00951 ** 
## LPRICE3       0.9673     0.8367   1.156  0.24894    
## LPRICE4       1.2136     0.2246   5.403 1.81e-07 ***
## LPRICE5       0.6840     0.7101   0.963  0.33653    
## LPRICE6       0.1525     1.3044   0.117  0.90707    
## LPRICE7       0.6074     0.3141   1.934  0.05455 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4745 on 205 degrees of freedom
## Multiple R-squared:  0.6268, Adjusted R-squared:  0.614 
## F-statistic: 49.18 on 7 and 205 DF,  p-value: < 2.2e-16

#Validation r^2
starkist_test_results<-predict(starkist, Test)
cor(log(Test$MOVE1),starkist_test_results)^2

## [1] 0.470962

plot(starkist$residuals)

We see from the summary above Star Kist is highly sensitive to changes in its price which is not surprising for a market leader. We also recognize that Star Kist is sensitive to price changes in Bumble Bee Chunk and Chicken of the Sea. This model explains 61% of variance and validation r squared for this model was 47%

Chicken of the Sea 6 oz.

chicken<-lm(log(MOVE2)~LPRICE1+LPRICE2+LPRICE3+LPRICE4+LPRICE5+LPRICE6+LPRICE7, data = Train)
summary(chicken)

## 
## Call:
## lm(formula = log(MOVE2) ~ LPRICE1 + LPRICE2 + LPRICE3 + LPRICE4 + 
##     LPRICE5 + LPRICE6 + LPRICE7, data = Train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.2162 -0.2720 -0.0655  0.2033  3.4282 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.43105    1.59622   5.908 1.42e-08 ***
## LPRICE1      1.16471    0.25634   4.544 9.44e-06 ***
## LPRICE2     -5.08640    0.33903 -15.003  < 2e-16 ***
## LPRICE3     -1.55188    0.87612  -1.771    0.078 .  
## LPRICE4      1.19010    0.23521   5.060 9.31e-07 ***
## LPRICE5      0.56985    0.74355   0.766    0.444    
## LPRICE6     -0.36339    1.36598  -0.266    0.790    
## LPRICE7      0.03257    0.32895   0.099    0.921    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4969 on 205 degrees of freedom
## Multiple R-squared:  0.5787, Adjusted R-squared:  0.5643 
## F-statistic: 40.23 on 7 and 205 DF,  p-value: < 2.2e-16

#Validation r^2
chicken_test_results<-predict(chicken, Test)
cor(log(Test$MOVE2),chicken_test_results)^2

## [1] 0.7133586

plot(chicken$residuals)

Chicken of the Sea is sensitive to price changes in the price of itself, Star Kist, and Bumble Bee Chunk. Since these three products were also highly connected in the model for StarKist, this leads me to have the hypothesis that the consumer finds these three brands substitutes for each other which leads to constant competition between them. This model explains 56% of the variance and the r squared for the validation set was 71%

Bumble Bee Solid 6.12 oz.

BB_solid<-lm(log(MOVE3)~LPRICE1+LPRICE2+LPRICE3+LPRICE4+LPRICE5+LPRICE6+LPRICE7, data = Train)
summary(BB_solid)

## 
## Call:
## lm(formula = log(MOVE3) ~ LPRICE1 + LPRICE2 + LPRICE3 + LPRICE4 + 
##     LPRICE5 + LPRICE6 + LPRICE7, data = Train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6330 -0.1359  0.1247  0.4037  1.0901 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.0839     2.6543   3.422 0.000749 ***
## LPRICE1       0.7535     0.4263   1.768 0.078609 .  
## LPRICE2      -0.8424     0.5638  -1.494 0.136654    
## LPRICE3      -7.0190     1.4569  -4.818 2.82e-06 ***
## LPRICE4      -1.4034     0.3911  -3.588 0.000416 ***
## LPRICE5       2.8280     1.2364   2.287 0.023202 *  
## LPRICE6       0.3412     2.2714   0.150 0.880744    
## LPRICE7      -1.9426     0.5470  -3.551 0.000475 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8262 on 205 degrees of freedom
## Multiple R-squared:  0.2014, Adjusted R-squared:  0.1741 
## F-statistic: 7.384 on 7 and 205 DF,  p-value: 6.556e-08

#Validation r^2
BB_solid_test_results<-predict(BB_solid, Test)
cor(log(Test$MOVE3),BB_solid_test_results)^2

## [1] 0.2232198

#plot of residuals
plot(BB_solid$residuals)

Bumble Bee Solid is an interesting product. As expected we see this product highly sensitive to price changes in itself but also price changes in Bumble Bee Chunk, Geisha and HH Chunk Lite. This model explains 17% of the variance. Validation R squared is 22%

Bumble Bee Chunk 6.12 oz.

BB_chunk<-lm(log(MOVE4)~LPRICE1+LPRICE2+LPRICE3+LPRICE4+LPRICE5+LPRICE6+LPRICE7, data = Train)
summary(BB_chunk)

## 
## Call:
## lm(formula = log(MOVE4) ~ LPRICE1 + LPRICE2 + LPRICE3 + LPRICE4 + 
##     LPRICE5 + LPRICE6 + LPRICE7, data = Train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.29061 -0.31108 -0.07639  0.24882  1.91279 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.52140    1.51758   6.933 5.26e-11 ***
## LPRICE1      1.01332    0.24372   4.158 4.72e-05 ***
## LPRICE2      1.84884    0.32233   5.736 3.44e-08 ***
## LPRICE3     -0.72134    0.83295  -0.866    0.387    
## LPRICE4     -4.89120    0.22362 -21.873  < 2e-16 ***
## LPRICE5      0.10963    0.70692   0.155    0.877    
## LPRICE6     -1.41314    1.29868  -1.088    0.278    
## LPRICE7      0.09014    0.31275   0.288    0.773    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4724 on 205 degrees of freedom
## Multiple R-squared:  0.7123, Adjusted R-squared:  0.7025 
## F-statistic:  72.5 on 7 and 205 DF,  p-value: < 2.2e-16

#Validation r^2
BB_chunk_test_results<-predict(BB_chunk, Test)
cor(log(Test$MOVE4),BB_chunk_test_results)^2

## [1] 0.5460324

#plot of residuals
plot(BB_chunk$residuals)

Bumble Bee Chunk is highly depended on changes in its price and the prices of Star Kist and Chicken of the Sea. As we discussed in the models for Star Kist and chicken of the sea, it appears these products are highly substitutable in the minds of the consumer. 70% of the variance is explained by this model and validation r squared is 55%

Geisha 6 oz.

geisha<-lm(log(MOVE5)~LPRICE1+LPRICE2+LPRICE3+LPRICE4+LPRICE5+LPRICE6+LPRICE7, data = Train)
summary(geisha)

## 
## Call:
## lm(formula = log(MOVE5) ~ LPRICE1 + LPRICE2 + LPRICE3 + LPRICE4 + 
##     LPRICE5 + LPRICE6 + LPRICE7, data = Train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.09061 -0.14578  0.02251  0.13405  0.90279 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.45719    0.80318  11.775   <2e-16 ***
## LPRICE1     -0.23723    0.12899  -1.839   0.0673 .  
## LPRICE2      0.01789    0.17059   0.105   0.9166    
## LPRICE3      0.12420    0.44084   0.282   0.7784    
## LPRICE4     -0.06514    0.11835  -0.550   0.5826    
## LPRICE5     -5.10405    0.37414 -13.642   <2e-16 ***
## LPRICE6      0.17119    0.68733   0.249   0.8036    
## LPRICE7     -0.15109    0.16552  -0.913   0.3624    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.25 on 205 degrees of freedom
## Multiple R-squared:  0.5091, Adjusted R-squared:  0.4923 
## F-statistic: 30.37 on 7 and 205 DF,  p-value: < 2.2e-16

#Validation r^2
geisha_test_results<-predict(geisha, Test)
cor(log(Test$MOVE5),geisha_test_results)^2

## [1] 0.5782765

#plot of residuals
plot(geisha$residuals)

Geisha is also an interesting brand. It seems only highly dependent on changes in its price; this leads me to believe that this is either a niche product or a product that is considered a value brand where consumers are choosing to purchase this product for reasons unrelated to prices of better known, more popular products. This model explains 49% of variance and validation r squared is 58%

BB Large Cans.

BB_large<-lm(log(MOVE6)~LPRICE1+LPRICE2+LPRICE3+LPRICE4+LPRICE5+LPRICE6+LPRICE7, data = Train)
summary(BB_large)

## 
## Call:
## lm(formula = log(MOVE6) ~ LPRICE1 + LPRICE2 + LPRICE3 + LPRICE4 + 
##     LPRICE5 + LPRICE6 + LPRICE7, data = Train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6508 -0.1188  0.0899  0.3039  0.8864 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.91828    1.91930   5.168 5.61e-07 ***
## LPRICE1      0.63198    0.30823   2.050   0.0416 *  
## LPRICE2     -0.02904    0.40765  -0.071   0.9433    
## LPRICE3     -1.45282    1.05345  -1.379   0.1694    
## LPRICE4     -0.43409    0.28282  -1.535   0.1264    
## LPRICE5      1.14769    0.89405   1.284   0.2007    
## LPRICE6     -2.36234    1.64246  -1.438   0.1519    
## LPRICE7     -0.70803    0.39553  -1.790   0.0749 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5974 on 205 degrees of freedom
## Multiple R-squared:  0.06264,    Adjusted R-squared:  0.03063 
## F-statistic: 1.957 on 7 and 205 DF,  p-value: 0.06249

#Validation r^2
BB_large_test_results<-predict(BB_large, Test)
cor(log(Test$MOVE6),BB_large_test_results)^2

## [1] 0.0424402

Bumble Bee Large Cans appears to be in a category into itself. Its volumes are not sensitive to prices of itself and barely sensitive to the price of the market leader, star kist. I would hypothesize that due to the size being larger than the other products in the dataset, the consumers purchasing this need the size for convienence possibly for food service and savings due to small fluctuations in price of smaller formats are outweighed by the time it would take to open multiple cans of an equivalent quantity.

HH Chunk Lite 6.5oz.

HH_chunk<-lm(log(MOVE7)~LPRICE1+LPRICE2+LPRICE3+LPRICE4+LPRICE5+LPRICE6+LPRICE7, data = Train)
summary(HH_chunk)

## 
## Call:
## lm(formula = log(MOVE7) ~ LPRICE1 + LPRICE2 + LPRICE3 + LPRICE4 + 
##     LPRICE5 + LPRICE6 + LPRICE7, data = Train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.28924 -0.23696  0.00023  0.28628  2.68397 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.7659     1.9363   6.077 5.89e-09 ***
## LPRICE1       0.9566     0.3110   3.076  0.00238 ** 
## LPRICE2       0.3614     0.4113   0.879  0.38054    
## LPRICE3       1.8882     1.0628   1.777  0.07709 .  
## LPRICE4      -0.1416     0.2853  -0.496  0.62015    
## LPRICE5       0.9730     0.9019   1.079  0.28194    
## LPRICE6      -4.2434     1.6570  -2.561  0.01116 *  
## LPRICE7      -3.4142     0.3990  -8.556 2.73e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6027 on 205 degrees of freedom
## Multiple R-squared:  0.3158, Adjusted R-squared:  0.2924 
## F-statistic: 13.52 on 7 and 205 DF,  p-value: 2.524e-14

#Validation r^2
HH_chunk_test_results<-predict(HH_chunk, Test)
cor(log(Test$MOVE7),HH_chunk_test_results)^2

## [1] 0.2185917

#plot of residuals
plot(HH_chunk$residuals)

HH Chunk Lite seems highly dependent on the price of itself and the price of the Bumble Bee Large Cans and Star Kist. The model explains 29% of the variance and the validation r squared is 22%

Limitations of models: Due to the data provided to us we are only able to look at the effect on sales due to price, but in the real world there would be multiple factors that would go into the calculus regarding a change in sales amount. Some things would be advertising, promotions, availability on shelves, and numerous other factors.