# 19 Solutions to exercises

## 19.1 Chapter 3

For annual values, see 19.1:

```
data_ml %>%
group_by(date) %>%
mutate(growth = Pb > median(Pb)) %>% # Creates the sort
ungroup() %>% # Ungroup
mutate(year = lubridate::year(date)) %>% # Creates a year variable
group_by(year, growth) %>% # Analyze by year & sort
summarize(ret = mean(R1M_Usd)) %>% # Compute average return
ggplot(aes(x = year, y = ret, fill = growth)) + geom_col(position = "dodge") + # Plot!
theme(legend.position = c(0.7, 0.8)) + theme_bw()
```

For monthly values, see 19.2:

```
returns_m <- data_ml %>%
group_by(date) %>%
mutate(growth = Pb > median(Pb)) %>% # Creates the sort
group_by(date, growth) %>% # Analyze by date & sort
summarize(ret = mean(R1M_Usd)) %>% # Compute average return
spread(key = growth, value = ret) %>% # Pivot to wide matrix format
ungroup()
colnames(returns_m)[2:3] <- c("value", "growth") # Changing column names
returns_m %>%
mutate(value = cumprod(1 + value), # From returns to portf. values
growth = cumprod(1 + growth)) %>%
gather(key = portfolio, value = value, -date) %>% # Back in tidy format
ggplot(aes(x = date, y = value, color = portfolio)) + geom_line() + # Plot!
theme(legend.position = c(0.7, 0.8)) + theme_bw()
```

Portfolios based on quartiles, using the tidyverse only. We rely heavily on the fact that features are uniformized, i.e., that their distribution is uniform for each given date. Overall, small firms outperform heavily (see Figure 19.3).

```
data_ml %>%
mutate(small = Mkt_Cap_6M_Usd <= 0.25, # Small firms...
medium = Mkt_Cap_6M_Usd > 0.25 & Mkt_Cap_6M_Usd <= 0.5,
large = Mkt_Cap_6M_Usd > 0.5 & Mkt_Cap_6M_Usd <= 0.75,
xl = Mkt_Cap_6M_Usd > 0.75, # ...Xlarge firms
year = year(date)) %>%
group_by(year) %>%
summarize(small = mean(small * R1M_Usd), # Compute avg returns
medium = mean(medium * R1M_Usd),
large = mean(large * R1M_Usd),
xl = mean(xl * R1M_Usd)) %>%
gather(key = size, value = return, -year) %>%
ggplot(aes(x = year, y = return, fill = size)) +
geom_col(position = "dodge") + theme_bw()
```

## 19.2 Chapter 4

Below, we import a credit spread supplied by Bank of America. Its symbol/ticker is “BAMLC0A0CM”. We apply the data expansion on the small number of predictors to save memory space. One important trick that should not be overlooked is the uniformization step after the product (4.3) is computed. Indeed, we want the new features to have the same properties as the old ones. If we skip this step, distributions will be altered, as we show in one example below.

We start with the data extraction and joining. It’s important to join early so as to keep the highest data frequency (daily) in order to replace missing points with **close values**. Joining with monthly data before replacing creates unnecessary lags.

```
getSymbols.FRED("BAMLC0A0CM", # Extract data
env = ".GlobalEnv",
return.class = "xts")
```

`## [1] "BAMLC0A0CM"`

```
cred_spread <- fortify(BAMLC0A0CM) # Transform to dataframe
colnames(cred_spread) <- c("date", "spread") # Change column name
cred_spread <- cred_spread %>% # Take extraction and...
full_join(data_ml %>% dplyr::select(date), by = "date") %>% # Join!
mutate(spread = na.locf(spread)) # Replace NA by previous
cred_spread <- cred_spread[!duplicated(cred_spread),] # Remove duplicates
```

The creation of the augmented dataset requires some manipulation. Features are no longer uniform as is shown in Figure 19.4.

```
data_cond <- data_ml %>% # Create new dataset
dplyr::select(c("stock_id", "date", features_short))
names_cred_spread <- paste0(features_short, "_cred_spread") # New column names
feat_cred_spread <- data_cond %>% # Old values
dplyr::select(features_short)
cred_spread <- data_ml %>% # Create vector of spreads
dplyr::select(date) %>%
left_join(cred_spread, by = "date")
feat_cred_spread <- feat_cred_spread * # This product creates...
matrix(cred_spread$spread, # the new values...
length(cred_spread$spread), # using duplicated...
length(features_short)) # columns
colnames(feat_cred_spread) <- names_cred_spread # New column names
data_cond <- bind_cols(data_cond, feat_cred_spread) # Aggregate old & new
data_cond %>% ggplot(aes(x = Eps_cred_spread)) + geom_histogram() # Plot example
```

To prevent this issue, uniformization is required and is verified in Figure 19.5.

```
data_cond <- data_cond %>% # From new dataset
group_by(date) %>% # Group by date and...
mutate_at(names_cred_spread, norm_unif) # Uniformize the new features
data_cond %>% ggplot(aes(x = Eps_cred_spread)) + geom_histogram(bins = 100) # Verification
```

The second question naturally requires the downloading of VIX series first and the joining with the original data.

```
getSymbols.FRED("VIXCLS", # Extract data
env = ".GlobalEnv",
return.class = "xts")
```

`## [1] "VIXCLS"`

```
vix <- fortify(VIXCLS) # Transform to dataframe
colnames(vix) <- c("date", "vix") # Change column name
vix <- vix %>% # Take extraction and...
full_join(data_ml %>% dplyr::select(date), by = "date") %>% # Join!
mutate(vix = na.locf(vix)) # Replace NA by previous
vix <- vix[!duplicated(vix),] # Remove duplicates
vix <- data_ml %>% # Keep original data format
dplyr::select(date) %>% # ...
left_join(vix, by = "date") # Via left_join()
```

We can then proceed with the categorization. We create the vector label in a new (smaller) dataset but not attached to the large data_ml variable. Also, we check the balance of labels and its evolution through time (see Figure 19.6).

```
delta <- 0.5 # Magnitude of vix correction
vix_bar <- median(vix$vix) # Median of vix
data_vix <- data_ml %>% # Smaller dataset
dplyr::select(stock_id, date, R1M_Usd) %>%
mutate(r_minus = (-0.02) * exp(-delta*(vix$vix-vix_bar)), # r_-
r_plus = 0.02 * exp(delta*(vix$vix-vix_bar))) # r_+
data_vix <- data_vix %>%
mutate(R1M_Usd_Cvix = if_else(R1M_Usd < r_minus, -1, # New label!
if_else(R1M_Usd > r_plus, 1,0)),
R1M_Usd_Cvix = as.factor(R1M_Usd_Cvix))
data_vix %>%
mutate(year = year(date)) %>%
group_by(year, R1M_Usd_Cvix) %>%
summarize(nb = n()) %>%
ggplot(aes(x = year, y = nb, fill = R1M_Usd_Cvix)) +
geom_col() + theme_bw()
```

Finally, we switch to the outliers (Figure 19.7).

```
data_ml %>%
ggplot(aes(x = R12M_Usd)) + geom_histogram() + theme_bw()
```

Returns above 50 should indeed be rare.

```
## # A tibble: 8 × 3
## stock_id date R12M_Usd
## <int> <date> <dbl>
## 1 212 2000-12-31 53.0
## 2 221 2008-12-31 53.5
## 3 221 2009-01-31 55.2
## 4 221 2009-02-28 54.8
## 5 296 2002-06-30 72.2
## 6 683 2009-02-28 96.0
## 7 683 2009-03-31 64.8
## 8 862 2009-02-28 58.0
```

The largest return comes from stock #683. Let’s have a look at the stream of monthly returns in 2009.

```
## # A tibble: 12 × 2
## date R1M_Usd
## <date> <dbl>
## 1 2009-01-31 -0.625
## 2 2009-02-28 0.472
## 3 2009-03-31 1.44
## 4 2009-04-30 0.139
## 5 2009-05-31 0.086
## 6 2009-06-30 0.185
## 7 2009-07-31 0.363
## 8 2009-08-31 0.103
## 9 2009-09-30 9.91
## 10 2009-10-31 0.101
## 11 2009-11-30 0.202
## 12 2009-12-31 -0.251
```

The returns are all very high. The annual value is plausible. In addition, a quick glance at the Vol1Y values shows that the stock is the most volatile of the dataset.

## 19.3 Chapter 5

We recycle the training and testing data variables created in the chapter (coding section notably). In addition, we create a dedicated function and resort to the *map2*() function from the *purrr* package.

```
alpha_seq <- (0:10)/10 # Sequence of alpha values
lambda_seq <- 0.1^(0:5) # Sequence of lambda values
pars <- expand.grid(alpha_seq, lambda_seq) # Exploring all combinations!
alpha_seq <- pars[,1]
lambda_seq <- pars[,2]
lasso_sens <- function(alpha, lambda, x_train, y_train, x_test, y_test){ # Function
fit_temp <- glmnet(x_train, y_train, # Model
alpha = alpha, lambda = lambda)
return(sqrt(mean((predict(fit_temp, x_test) - y_test)^2))) # Output
}
rmse_elas <- map2(alpha_seq, lambda_seq, lasso_sens, # Automation
x_train = x_penalized_train, y_train = y_penalized_train,
x_test = x_penalized_test, y_test = testing_sample$R1M_Usd)
bind_cols(alpha = alpha_seq, lambda = as.factor(lambda_seq), rmse = unlist(rmse_elas)) %>%
ggplot(aes(x = alpha, y = rmse, fill = lambda)) + geom_col() + facet_grid(lambda ~.) +
coord_cartesian(ylim = c(0.19,0.193)) + theme_bw()
```

As is outlined in Figure 19.8, the parameters have a very marginal impact. Maybe the model is not a good fit for the task.

## 19.4 Chapter 6

```
fit1 <- rpart(formula,
data = training_sample, # Data source: full sample
cp = 0.001) # Precision: smaller = more leaves
mean((predict(fit1, testing_sample) - testing_sample$R1M_Usd)^2)
```

`## [1] 0.04018973`

```
fit2 <- rpart(formula,
data = training_sample, # Data source: full sample
cp = 0.01) # Precision: smaller = more leaves
mean((predict(fit2, testing_sample) - testing_sample$R1M_Usd)^2) # Test!
```

`## [1] 0.03699696`

`rpart.plot(fit1) # Plot the first tree`

The first model (Figure 19.9) is **too** precise: going into the details of the training sample does not translate to good performance out-of-sample. The second, simpler model, yields better results.

```
n_trees <- c(10, 20, 40, 80, 160)
mse_RF <- 0
for(j in 1:length(n_trees)){ # No need for functional programming here...
fit_temp <- randomForest(
as.formula(paste("R1M_Usd ~", paste(features_short, collapse = " + "))), # New formula!
data = training_sample, # Data source: training sample
sampsize = 30000, # Size of (random) sample for each tree
replace = TRUE, # Is the sampling done with replacement?
ntree = n_trees[j], # Nb of random trees
mtry = 5) # Nb of predictors for each tree
mse_RF[j] <- mean((predict(fit_temp, testing_sample) - testing_sample$R1M_Usd)^2)
}
mse_RF
```

`## [1] 0.03967754 0.03885924 0.03766900 0.03696370 0.03699772`

Trees are by definition random so results can vary from test to test. Overall, large numbers of trees are preferable and the reason is that each new tree tells a new story and diversifies the risk of the whole forest. Some more technical details of why that may be the case are outlined in the original paper by Breiman (2001).

For the last exercises, we recycle the *formula* used in Chapter 6.

```
tree_2008 <- rpart(formula,
data = data_ml %>% filter(year(date) == 2008), # Data source: 2008
cp = 0.001,
maxdepth = 2)
rpart.plot(tree_2008)
```

The first splitting criterion in Figure 19.10 is enterprise value (EV). EV is an indicator that adjusts market capitalization by substracting debt and adding cash. It is a more faithful account of the true value of a company. In 2008, the companies that fared the least poorly were those with the highest EV (i.e., large, robust firms).

```
tree_2009 <- rpart(formula,
data = data_ml %>% filter(year(date) == 2009), # Data source: 2009
cp = 0.001,
maxdepth = 2)
rpart.plot(tree_2009)
```

In 2009 (Figure 19.11), the firms that recovered the fastest were those that experienced high volatility in the past (likely, downwards volatility). Momentum is also very important: the firms with the lowest past returns are those that rebound the fastest. This is a typical example of the momentum crash phenomenon studied in Barroso and Santa-Clara (2015) and K. Daniel and Moskowitz (2016). The rationale is the following: after a market downturn, the stocks with the most potential for growth are those that have suffered the largest losses. Consequently, the negative (short) leg of the momentum factor performs very well, often better than the long leg. And indeed, being long in the momentum factor in 2009 would have generated negative profits.

## 19.5 Chapter 7: the autoencoder model & universal approximation

First, it is imperative to format the inputs properly. To avoid any issues, we work with perfectly rectangular data and hence restrict the investment set to the stocks with no missing points. Dimensions must also be in the correct order.

```
data_short <- data_ml %>% # Shorter dataset
filter(stock_id %in% stock_ids_short) %>%
dplyr::select(c("stock_id", "date",features_short, "R1M_Usd"))
dates <- unique(data_short$date) # Vector of dates
N <- length(stock_ids_short) # Dimension for assets
Tt <- length(dates) # Dimension for dates
K <- length(features_short) # Dimension for features
factor_data <- data_short %>% # Factor side date
dplyr::select(date, stock_id, R1M_Usd) %>%
spread(key = stock_id, value = R1M_Usd) %>%
dplyr::select(-date) %>%
as.matrix()
beta_data <- array(unlist(data_short %>% # Beta side data: beware the permutation below!
dplyr::select(-stock_id, -date, -R1M_Usd)),
dim = c(N, Tt, K))
beta_data <- aperm(beta_data, c(2,1,3)) # Permutation
```

Next, we turn to the specification of the network, using a functional API form.

```
main_input <- layer_input(shape = c(N), name = "main_input") # Main input: returns
factor_network <- main_input %>% # Def of factor side network
layer_dense(units = 8, activation = "relu", name = "layer_1_r") %>%
layer_dense(units = 4, activation = "tanh", name = "layer_2_r")
aux_input <- layer_input(shape = c(N,K), name = "aux_input") # Aux input: characteristics
beta_network <- aux_input %>% # Def of beta side network
layer_dense(units = 8, activation = "relu", name = "layer_1_l") %>%
layer_dense(units = 4, activation = "tanh", name = "layer_2_l") %>%
layer_permute(dims = c(2,1), name = "layer_3_l") # Permutation!
main_output <- layer_dot(c(beta_network, factor_network), # Product of 2 networks
axes = 1, name = "main_output")
model_ae <- keras_model( # AE Model specs
inputs = c(main_input, aux_input),
outputs = c(main_output)
)
```

Finally, we ask for the structure of the model, and train it.

`summary(model_ae) # See model details / architecture`

```
## Model: "model_1"
## __________________________________________________________________________________________
## Layer (type) Output Shape Param # Connected to
## ==========================================================================================
## aux_input (InputLayer) [(None, 793, 7)] 0
## __________________________________________________________________________________________
## layer_1_l (Dense) (None, 793, 8) 64 aux_input[0][0]
## __________________________________________________________________________________________
## main_input (InputLayer) [(None, 793)] 0
## __________________________________________________________________________________________
## layer_2_l (Dense) (None, 793, 4) 36 layer_1_l[0][0]
## __________________________________________________________________________________________
## layer_1_r (Dense) (None, 8) 6352 main_input[0][0]
## __________________________________________________________________________________________
## layer_3_l (Permute) (None, 4, 793) 0 layer_2_l[0][0]
## __________________________________________________________________________________________
## layer_2_r (Dense) (None, 4) 36 layer_1_r[0][0]
## __________________________________________________________________________________________
## main_output (Dot) (None, 793) 0 layer_3_l[0][0]
## layer_2_r[0][0]
## ==========================================================================================
## Total params: 6,488
## Trainable params: 6,488
## Non-trainable params: 0
## __________________________________________________________________________________________
```

```
model_ae %>% keras::compile( # Learning parameters
optimizer = "rmsprop",
loss = "mean_squared_error"
)
model_ae %>% fit( # Learning function
x = list(main_input = factor_data, aux_input = beta_data),
y = list(main_output = factor_data),
epochs = 20, # Nb rounds
batch_size = 49 # Nb obs. per round
)
```

For the second exercise, we use a simple architecture. The activation function, number of epochs and batch size may matter…

```
model_ua <- keras_model_sequential()
model_ua %>% # This defines the structure of the network, i.e. how layers are organized
layer_dense(units = 16, activation = 'sigmoid', input_shape = 1) %>%
layer_dense(units = 1) #
model_ua %>% keras::compile( # Model specification
loss = 'mean_squared_error', # Loss function
optimizer = optimizer_rmsprop(), # Optimisation method (weight updating)
metrics = c('mean_absolute_error') # Output metric
)
summary(model_ua) # A simple model!
```

```
## Model: "sequential_7"
## __________________________________________________________________________________________
## Layer (type) Output Shape Param #
## ==========================================================================================
## dense_22 (Dense) (None, 16) 32
## __________________________________________________________________________________________
## dense_21 (Dense) (None, 1) 17
## ==========================================================================================
## Total params: 49
## Trainable params: 49
## Non-trainable params: 0
## __________________________________________________________________________________________
```

```
fit_ua <- model_ua %>%
fit(seq(0, 6, by = 0.001) %>% matrix(ncol = 1), # Training data = x
sin(seq(0, 6, by = 0.001)) %>% matrix(ncol = 1), # Training label = y
epochs = 30, batch_size = 64 # Training parameters
)
```

In full disclosure, to improve the fit, we also increase the sample size. We show the improvement in the figure below.

```
library(patchwork)
model_ua2 <- keras_model_sequential()
model_ua2 %>% # This defines the structure of the network, i.e. how layers are organized
layer_dense(units = 128, activation = 'sigmoid', input_shape = 1) %>%
layer_dense(units = 1) #
model_ua2 %>% keras::compile( # Model specification
loss = 'mean_squared_error', # Loss function
optimizer = optimizer_rmsprop(), # Optimisation method (weight updating)
metrics = c('mean_absolute_error') # Output metric
)
summary(model_ua2) # A simple model!
```

```
## Model: "sequential_8"
## __________________________________________________________________________________________
## Layer (type) Output Shape Param #
## ==========================================================================================
## dense_24 (Dense) (None, 128) 256
## __________________________________________________________________________________________
## dense_23 (Dense) (None, 1) 129
## ==========================================================================================
## Total params: 385
## Trainable params: 385
## Non-trainable params: 0
## __________________________________________________________________________________________
```

```
fit_ua2 <- model_ua2 %>%
fit(seq(0, 6, by = 0.0002) %>% matrix(ncol = 1), # Training data = x
sin(seq(0, 6, by = 0.0002)) %>% matrix(ncol = 1), # Training label = y
epochs = 60, batch_size = 64 # Training parameters
)
tibble(x = seq(0, 6, by = 0.001)) %>%
ggplot() +
geom_line(aes(x = x, y = predict(model_ua, x), color = "Small model")) +
geom_line(aes(x = x, y = predict(model_ua2, x), color = "Large model")) +
stat_function(fun = sin, aes(color = "sin(x) function")) +
scale_color_manual(values = c("#EEAA33", "#3366CC", "#000000")) + theme_bw()
```

## 19.6 Chapter 8

Since we are going to reproduce a similar analysis several times, let’s simplify the task with 2 tips. First, by using default parameter values that will be passed as common arguments to the *svm* function. Second, by creating a custom function that computes the MSE. Third, by resorting to functional calculus via the *map* function from the *purrr* package. Below, we recycle datasets created in Chapter 6.

```
mse <- function(fit, features, label){ # MSE function
return(mean((predict(fit, features)-label)^2))
}
par_list <- list(y = train_label_xgb[1:10000], # From Tree chapter
x = train_features_xgb[1:10000,],
type = "eps-regression",
epsilon = 0.1, # Width of strip for errors
gamma = 0.5, # Constant in the radial kernel
cost = 0.1)
svm_par <- function(kernel, par_list){ # Function for SVM fit automation
require(e1071)
return(do.call(svm, c(kernel = kernel, par_list)))
}
kernels <- c("linear", "radial", "polynomial", "sigmoid") # Kernels
fit_svm_par <- map(kernels, svm_par, par_list = par_list) # SVM models
map(fit_svm_par, mse, # MSEs
features = test_feat_short, # From SVM chapter
label = testing_sample$R1M_Usd)
```

```
## [[1]]
## [1] 0.03849786
##
## [[2]]
## [1] 0.03924576
##
## [[3]]
## [1] 0.03951328
##
## [[4]]
## [1] 334.8173
```

The first two kernels yield the best fit, while the last one should be avoided. Note that apart from the linear kernel, all other options require parameters. We have used the default ones, which may explain the poor performance of some nonlinear kernels.

Below, we train an SVM model on a training sample with all observations but that is limited to the 7 major predictors. Even with a smaller number of features, the training is time consuming.

```
svm_full <- svm(y = train_label_xgb, # Train label
x = train_features_xgb, # Training features
type = "eps-regression", # SVM task type (see LIBSVM documentation)
kernel = "linear", # SVM kernel
epsilon = 0.1, # Width of strip for errors
cost = 0.1) # Slack variable penalisation
test_feat_short <- dplyr::select(testing_sample,features_short) # Test set
mean(predict(svm_full, test_feat_short) * testing_sample$R1M_Usd > 0) # Hit ratio
```

`## [1] 0.490343`

This figure is very low. Below, we test a very simple form of boosted trees, for comparison purposes.

```
xgb_full <- xgb.train(data = train_matrix_xgb, # Data source
eta = 0.3, # Learning rate
objective = "reg:linear", # Objective function
max_depth = 4, # Maximum depth of trees
nrounds = 60 # Number of trees used (bit low here)
)
```

`## [14:44:45] WARNING: amalgamation/../src/objective/regression_obj.cu:188: reg:linear is now deprecated in favor of reg:squarederror.`

`## [1] 0.5017377`

The forecasts are slightly better, but the computation time is lower. Two reasons why the models perform poorly:

- there are not enough predictors;

- the models are static: they do not adjust dynamically to macro-conditions.

## 19.7 Chapter 11: ensemble neural network

First, we create the three feature sets. The first one gets all multiples of 3 between 3 and 93. The second one gets the same indices, minus one, and the third one, the initial indices minus two.

```
feat_train_1 <- training_sample %>% dplyr::select(features[3*(1:31)]) %>% # First set of feats
as.matrix()
feat_train_2 <- training_sample %>% dplyr::select(features[3*(1:31)-1]) %>% # Second set of feats
as.matrix()
feat_train_3 <- training_sample %>% dplyr::select(features[3*(1:31)-2]) %>% # Third set of feats
as.matrix()
feat_test_1 <- testing_sample %>% dplyr::select(features[3*(1:31)]) %>% # Test features 1
as.matrix()
feat_test_2 <- testing_sample %>% dplyr::select(features[3*(1:31)-1]) %>% # Test features 2
as.matrix()
feat_test_3 <- testing_sample %>% dplyr::select(features[3*(1:31)-2]) %>% # Test features 3
as.matrix()
```

Then, we specify the network structure. First, the 3 independent networks, then the aggregation.

```
first_input <- layer_input(shape = c(31), name = "first_input") # First input
first_network <- first_input %>% # Def of 1st network
layer_dense(units = 8, activation = "relu", name = "layer_1") %>%
layer_dense(units = 2, activation = 'softmax') # Softmax for categ. output
second_input <- layer_input(shape = c(31), name = "second_input") # Second input
second_network <- second_input %>% # Def of 2nd network
layer_dense(units = 8, activation = "relu", name = "layer_2") %>%
layer_dense(units = 2, activation = 'softmax') # Softmax for categ. output
third_input <- layer_input(shape = c(31), name = "third_input") # Third input
third_network <- third_input %>% # Def of 3rd network
layer_dense(units = 8, activation = "relu", name = "layer_3") %>%
layer_dense(units = 2, activation = 'softmax') # Softmax for categ. output
main_output <- layer_concatenate(c(first_network,
second_network,
third_network)) %>% # Combination
layer_dense(units = 2, activation = 'softmax', name = 'main_output')
model_ens <- keras_model( # Agg. Model specs
inputs = c(first_input, second_input, third_input),
outputs = c(main_output)
)
```

Lastly, we can train and evaluate (see Figure 19.13).

`summary(model_ens) # See model details / architecture`

```
## Model: "model_2"
## __________________________________________________________________________________________
## Layer (type) Output Shape Param # Connected to
## ==========================================================================================
## first_input (InputLayer) [(None, 31)] 0
## __________________________________________________________________________________________
## second_input (InputLayer) [(None, 31)] 0
## __________________________________________________________________________________________
## third_input (InputLayer) [(None, 31)] 0
## __________________________________________________________________________________________
## layer_1 (Dense) (None, 8) 256 first_input[0][0]
## __________________________________________________________________________________________
## layer_2 (Dense) (None, 8) 256 second_input[0][0]
## __________________________________________________________________________________________
## layer_3 (Dense) (None, 8) 256 third_input[0][0]
## __________________________________________________________________________________________
## dense_23 (Dense) (None, 2) 18 layer_1[0][0]
## __________________________________________________________________________________________
## dense_24 (Dense) (None, 2) 18 layer_2[0][0]
## __________________________________________________________________________________________
## dense_25 (Dense) (None, 2) 18 layer_3[0][0]
## __________________________________________________________________________________________
## concatenate (Concatenate) (None, 6) 0 dense_23[0][0]
## dense_24[0][0]
## dense_25[0][0]
## __________________________________________________________________________________________
## main_output (Dense) (None, 2) 14 concatenate[0][0]
## ==========================================================================================
## Total params: 836
## Trainable params: 836
## Non-trainable params: 0
## __________________________________________________________________________________________
```

```
model_ens %>% keras::compile( # Learning parameters
optimizer = optimizer_adam(),
loss = "binary_crossentropy",
metrics = "categorical_accuracy"
)
fit_NN_ens <- model_ens %>% fit( # Learning function
x = list(first_input = feat_train_1,
second_input = feat_train_2,
third_input = feat_train_3),
y = list(main_output = NN_train_labels_C), # Recycled from NN Chapter
epochs = 12, # Nb rounds
batch_size = 512, # Nb obs. per round
validation_data = list(list(feat_test_1, feat_test_2, feat_test_3),
NN_test_labels_C)
)
plot(fit_NN_ens)
```

## 19.8 Chapter 12

### 19.8.1 EW portfolios with the tidyverse

This one is incredibly easy; it’s simpler and more compact but close in spirit to the code that generates Figure 3.1. The returns are plotted in Figure 19.14.

```
data_ml %>%
group_by(date) %>% # Group by date
summarize(return = mean(R1M_Usd)) %>% # Compute return
ggplot(aes(x = date, y = return)) + geom_point() + geom_line() # Plot
```

### 19.8.2 Advanced weighting function

First, we code the function with all inputs.

```
weights <- function(Sigma, mu, Lambda, lambda, k_D, k_R, w_old){
N <- nrow(Sigma)
M <- solve(lambda*Sigma + 2*k_R*Lambda + 2*k_D*diag(N)) # Inverse matrix
num <- 1-sum(M %*% (mu + 2*k_R*Lambda %*% w_old)) # eta numerator
den <- sum(M %*% rep(1,N)) # eta denominator
eta <- num / den # eta
vec <- mu + eta * rep(1,N) + 2*k_R*Lambda %*% w_old # Vector in weight
return(M %*% vec)
}
```

Second, we test it on some random dataset. We use the returns created at the end of Chapter 1 and used for the Lasso allocation in Section 5.2.2. For \(\boldsymbol{\mu}\), we use the sample average, which is rarely a good idea in practice. It serves as illustration only.

```
Sigma <- returns %>% dplyr::select(-date) %>% as.matrix() %>% cov() # Covariance matrix
mu <- returns %>% dplyr::select(-date) %>% apply(2,mean) # Vector of exp. returns
Lambda <- diag(nrow(Sigma)) # Trans. Cost matrix
lambda <- 1 # Risk aversion
k_D <- 1
k_R <- 1
w_old <- rep(1, nrow(Sigma)) / nrow(Sigma) # Prev. weights: EW
weights(Sigma, mu, Lambda, lambda, k_D, k_R, w_old) %>% head() # First weights
```

```
## [,1]
## 1 0.0031339308
## 3 -0.0003243527
## 4 0.0011944677
## 7 0.0014194215
## 9 0.0015086240
## 11 -0.0005015207
```

Some weights can of course be negative. Finally, we use the map2() function to test some sensitivity. We examine 3 key indicators:

- **diversification**, which we measure via the inverse of the sum of squared weights (inverse Hirschman-Herfindhal index);

- **leverage**, which we assess via the absolute sum of negative weights;

- **in-sample volatility**, which we compute as \(\textbf{w}' \boldsymbol{\Sigma} \textbf{x}\)

To do so, we create a dedicated function below.

```
sensi <- function(lambda, k_D, Sigma, mu, Lambda, k_R, w_old){
w <- weights(Sigma, mu, Lambda, lambda, k_D, k_R, w_old)
out <- c()
out$div <- 1/sum(w^2) # Diversification
out$lev <- sum(abs(w[w<0])) # Leverage
out$vol <- t(w) %*% Sigma %*% w # In-sample vol
return(out)
}
```

Instead of using the baseline *map2* function, we rely on a version thereof that concatenates results into a dataframe directly.

```
lambda <- 10^(-3:2) # parameter values
k_D <- 2*10^(-3:2) # parameter values
pars <- expand_grid(lambda, k_D) # parameter grid
lambda <- pars$lambda
k_D <- pars$k_D
res <- map2_dfr(lambda, k_D, sensi,
Sigma = Sigma, mu = mu, Lambda = Lambda, k_R = k_R, w_old = w_old)
bind_cols(lambda = as.factor(lambda), k_D = as.factor(k_D), res) %>%
gather(key = indicator, value = value, -lambda, -k_D) %>%
ggplot(aes(x = lambda, y = value, fill = k_D)) + geom_col(position = "dodge") +
facet_grid(indicator ~. , scales = "free")
```

In Figure 19.15, each panel displays an indicator. In the first panel, we see that diversification increases with \(k_D\): indeed, as this number increases, the portfolio converges to uniform (EW) values. The parameter \(\lambda\) has a minor impact. The second panel naturally shows the inverse effect for leverage: as diversification increases with \(k_D\), leverage (i.e., total negative positions - shortsales) decreases. Finally, the last panel shows that in-sample volatility is however largely driven by the risk aversion parameter. As \(\lambda\) increases, volatility logically decreases. For small values of \(\lambda\), \(k_D\) is negatively related to volatility but the pattern reverses for large values of \(\lambda\). This is because the equally weighted portfolio is less risky than very leveraged mean-variance policies, but more risky than the minimum-variance portfolio.

### 19.8.3 Functional programming in the backtest

Often, programmers prefer to avoid loops. In order to avoid a loop in the backtest, we need to code what happens for one given date. This is encapsulated in the following function. For simplicity, we code it for only one strategy. Also, the function will assume the structure of the data is known, but the columns (features & labels) could also be passed as arguments. We recycle the function **weights_xgb** from Chapter 12.

```
portf_map <- function(t, data_ml, ticks, t_oos, m_offset, train_size, weight_func){
train_data <- data_ml %>% filter(date < t_oos[t] - m_offset * 30, # Roll. window w. buffer
date > t_oos[t] - m_offset * 30 - 365 * train_size)
test_data <- data_ml %>% filter(date == t_oos[t]) # Test set
realized_returns <- test_data %>% # Computing returns via:
dplyr::select(R1M_Usd) # 1M holding period!
temp_weights <- weight_func(train_data, test_data, features) # Weights = > recycled!
ind <- match(temp_weights$names, ticks) %>% na.omit() # Index of test assets
x <- c()
x$weights <- rep(0, length(ticks)) # Empty weights
x$weights[ind] <- temp_weights$weights # Locate weights correctly
x$returns <- sum(temp_weights$weights * realized_returns) # Compute returns
return(x)
}
```

Next, we combine this function to **map**(). We only test the first 6 dates: this reduces the computation times.

```
back_test <- 1:3 %>% # Test on the first 100 out-of-sample dates
map(portf_map, data_ml = data_ml, ticks = ticks, t_oos = t_oos,
m_offset = 1, train_size = 5, weight_func = weights_xgb)
head(back_test[[1]]$weights) # Sample weights
```

`## [1] 0.001675042 0.000000000 0.000000000 0.001675042 0.000000000 0.001675042`

`back_test[[1]]$returns # Return of first period`

`## [1] 0.0189129`

Each element of backtest is a list with two components: the portfolio weights and the returns. To access the data easily, functions like *melt* from the package *reshape2* are useful.

## 19.9 Chapter 15

We recycle the AE model trained in Chapter 15. Strangely, building smaller models (encoder) from larger ones (AE) requires to save and then reload the weights. This creates an external file, which we call “ae_weights”. We can check that the output does have 4 columns (compressed) instead of 7 (original data).

```
save_model_weights_hdf5(object = ae_model,filepath ="ae_weights.hdf5", overwrite = TRUE)
encoder_model <- keras_model(inputs = input_layer, outputs = encoder)
encoder_model %>%
load_model_weights_hdf5(filepath = "ae_weights.hdf5",skip_mismatch = TRUE,by_name = TRUE)
encoder_model %>% keras::compile(
loss = 'mean_squared_error',
optimizer = 'adam',
metrics = c('mean_absolute_error')
)
encoder_model %>%
keras::predict_on_batch(x = training_sample %>%
dplyr::select(features_short) %>%
as.matrix()) %>%
head(5)
```

```
## [,1] [,2] [,3] [,4]
## [1,] -0.08539051 1.200299 -0.8444785 -1.174033
## [2,] -0.06630807 1.195867 -0.8294346 -1.147886
## [3,] -0.09406579 1.194935 -0.8672158 -1.165673
## [4,] -0.09864470 1.191096 -0.8709179 -1.166036
## [5,] -0.10819313 1.173754 -0.8648338 -1.172021
```

## 19.10 Chapter 16

All we need to do is change the rho coefficient in the code of Chapter 16.

```
set.seed(42) # Fixing the random seed
n_sample <- 10^5 # Number of samples generated
rho <- (-0.8) # Autoregressive parameter
sd <- 0.4 # Std. dev. of noise
a <- 0.06 * rho # Scaled mean of returns
data_RL3 <- tibble(returns = a/rho + arima.sim(n = n_sample, # Returns via AR(1) simulation
list(ar = rho),
sd = sd),
action = round(runif(n_sample)*4)/4) %>% # Random action (portfolio)
mutate(new_state = if_else(returns < 0, "neg", "pos"), # Coding of state
reward = returns * action, # Reward = portfolio return
state = lag(new_state), # Next state
action = as.character(action)) %>%
na.omit() # Remove one missing state
```

The learning can then proceed.

```
control <- list(alpha = 0.1, # Learning rate
gamma = 0.7, # Discount factor for rewards
epsilon = 0.1) # Exploration rate
fit_RL3 <- ReinforcementLearning(data_RL3, # Main RL function
s = "state",
a = "action",
r = "reward",
s_new = "new_state",
control = control)
print(fit_RL3) # Show the output
```

```
## State-Action function Q
## 0.25 0 1 0.75 0.5
## neg 0.7107268 0.5971710 1.4662416 0.9535698 0.8069591
## pos 0.7730842 0.7869229 0.4734467 0.4258593 0.6257039
##
## Policy
## neg pos
## "1" "0"
##
## Reward (last iteration)
## [1] 3013.162
```

In this case, the constantly switching feature of the return process changes the outcome. The negative state is associated with large profits when the portfolio is fully invested, while the positive state has the best average reward when the agent refrains from investing.

For the second exercise, the trick is to define all possible actions, that is all combinations (+1,0-1) for the two assets on all dates. We recycle the data from Chapter 16.

```
pos_3 <- c(-1,0,1) # Possible alloc. to asset 1
pos_4 <- c(-1,0,1) # Possible alloc. to asset 3
pos <- expand_grid(pos_3, pos_4) # All combinations
pos <- bind_cols(pos, id = 1:nrow(pos)) # Adding combination id
ret_pb_RL <- bind_cols(r3 = return_3, r4 = return_4, # Returns & P/B dataframe
pb3 = pb_3, pb4 = pb_4)
data_RL4 <- sapply(ret_pb_RL, # Combining return & positions
rep.int,
times = nrow(pos)) %>%
data.frame() %>%
bind_cols(id = rep(1:nrow(pos), 1, each = length(return_3))) %>%
left_join(pos) %>% dplyr::select(-id) %>%
mutate(action = paste(pos_3, pos_4), # Uniting actions
pb3 = round(5 * pb3), # Simplifying states
pb4 = round(5 * pb4), # Simplifying states
state = paste(pb3, pb4), # Uniting states
reward = pos_3*r3 + pos_4*r4, # Computing rewards
new_state = lead(state)) %>% # Infer new state
dplyr::select(-pb3, -pb4, -pos_3, # Remove superfluous vars.
-pos_4, -r3, -r4)
```

We can the plug this data into the RL function.

```
fit_RL4 <- ReinforcementLearning(data_RL4, # Main RL function
s = "state",
a = "action",
r = "reward",
s_new = "new_state",
control = control)
fit_RL4$Q <- round(fit_RL4$Q, 3) # Round the Q-matrix
print(fit_RL4) # Show the output
```

```
## State-Action function Q
## 0 0 0 1 0 -1 -1 -1 -1 0 -1 1 1 -1 1 0 1 1
## 0 2 0.000 0.000 0.002 -0.017 -0.018 -0.020 0.023 0.025 0.024
## 0 3 0.001 -0.005 0.007 -0.013 -0.019 -0.026 0.031 0.027 0.021
## 3 1 0.003 0.003 0.003 0.002 0.002 0.003 0.002 0.002 0.003
## 2 1 0.027 0.038 0.020 0.004 0.015 0.039 0.013 0.021 0.041
## 2 2 0.021 0.014 0.027 0.038 0.047 0.045 -0.004 -0.011 -0.016
## 2 3 0.007 0.006 0.008 0.054 0.057 0.056 -0.041 -0.041 -0.041
## 1 1 0.027 0.054 0.005 -0.031 -0.005 0.041 0.025 0.046 0.072
## 1 2 0.019 0.020 0.020 0.015 0.023 0.029 0.012 0.014 0.023
## 1 3 0.008 0.019 0.000 -0.036 -0.027 -0.016 0.042 0.053 0.060
##
## Policy
## 0 2 0 3 3 1 2 1 2 2 2 3 1 1 1 2 1 3
## "1 0" "1 -1" "0 -1" "1 1" "-1 0" "-1 0" "1 1" "-1 1" "1 1"
##
## Reward (last iteration)
## [1] 0
```

The matrix is less sparse compared to the one of Chapter 16; we have covered much more ground! Some policy recommendations have not changed compared to the smaller sample, but some have! The change occurs for the states for which only a few points were available in the first trial. With more data, the decision is altered.

*MIS Quarterly*, 1293–1327.

*Expert Systems with Applications*140: 112891.

*Journal of Portfolio Management*35 (1): 52–56.

*Proceedings of the 23rd International Conference on Machine Learning*, 9–16. ACM.

*Outlier Analysis*. Springer.

*SSRN Working Paper*3930228.

*Journal of Financial Data Science*1 (4): 39–62.

*SSRN Working Paper*3578830.

*Missing Data*. Vol. 136. Sage publications.

*Expert Systems with Applications*87: 267–79.

*Expert Systems with Applications*130: 145–56.

*Journal of Finance*74 (6): 3187–3216.

*Journal of Banking & Finance*70: 23–37.

*Nature*567: 305–7.

*Talking Nets: An Oral History of Neural Networks*. MIT Press.

*arXiv Preprint*, no. 2003.01977.

*SSRN Working Paper*3726714.

*Asset Management: A Systematic Approach to Factor Investing*. Oxford University Press.

*Journal of Finance*61 (1): 259–99.

*Journal of Financial Economics*106 (1): 132–56.

*SSRN Working Paper*1106463.

*arXiv Preprint*, no. 1908.07442.

*arXiv Preprint*, no. 1907.02893.

*SSRN Working Paper*3116974.

*Management Science*61 (11): 2569–79.

*Journal of Portfolio Management*45 (4): 18–36.

*Journal of Financial Data Science*1 (1): 64–74.

*Journal of the American Statistical Association*115 (529): 482–85.

*Journal of Financial Economics*135 (3): 629–52.

*Journal of Finance*68 (3): 929–85.

*Journal of Portfolio Management*43 (5): 72–87.

*Journal of Portfolio Management*39 (4): 49–68.

*Journal of Financial Economics*129 (3): 479–509.

*Journal of Investment Management*13 (1): 27–63.

*Journal of Economic Surveys*33 (5): 1463–92.

*Journal of Portfolio Management*46 (3): 26–35.

*arXiv Preprint*, no. 2009.03394.

*R Package Version*1 (1).

*Asset Pricing and Portfolio Choice Theory*. Oxford University Press.

*Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection*. John Wiley & Sons.

*Econometrica*70 (1): 191–221.

*Significance (Royal Statistical Society)*Forthcoming.

*Journal of Portfolio Management*40 (5): 39–59.

*IEEE Trans. On Systems, Man, Cybernetics*8 (4): 311–13.

*Journal of Financial Economics*106 (3): 473–91.

*Financial Analysts Journal*67 (1): 40–54.

*Journal of Financial and Quantitative Analysis*Forthcoming: 1–24.

*Empirical Asset Pricing: The Cross Section of Stock Returns*. John Wiley & Sons.

*SSRN Working Paper*3686164.

*Expert Systems with Applications*42 (20): 7046–56.

*Management Science*64 (3): 1136–54.

*Journal of Finance*48 (5): 1719–47.

*Journal of Finance*48 (4): 1231–62.

*Journal of Financial Economics*9 (1): 3–18.

*Handbook of Behavioral Economics-Foundations and Applications*.

*Journal of Financial Economics*115 (1): 1–24.

*SSRN Working Paper*3477463.

*Review of Financial Studies*29 (11): 3068–3107.

*Journal of Financial Economics*68 (2): 161–99.

*Journal of Finance*73 (2): 715–54.

*IEEE Transactions on Information Theory*39 (3): 930–45.

*Machine Learning*14 (1): 115–33.

*Journal of Financial Economics*116 (1): 111–20.

*Neural Computation*16 (9): 1959–81.

*Journal of the Operational Research Society*20 (4): 451–68.

*Journal of Multivariate Analysis*175: 104544.

*SSRN Working Paper*2695101.

*Proceedings of the European Conference on Computer Vision (ECCV)*, 456–73.

*arXiv Preprint*, no. 2009.11698.

*SSRN Working Paper*3732113.

*Regression Diagnostics: Identifying Influential Data and Sources of Collinearity*. Vol. 571. John Wiley & Sons.

*Electronic Journal of Statistics*15 (1): 427–505.

*Machine Learning*79 (1-2): 151–75.

*Neural Networks: Tricks of the Trade*, 437–78. Springer.

*SSRN Working Paper*3438533.

*Journal of Machine Learning Research*13 (Feb): 281–305.

*Journal of Finance*54 (5): 1553–1607.

*Journal of Financial Economics*134 (2): 253–72.

*Procedia Economics and Finance*3: 68–77.

*Dynamic Programming and Optimal Control - Volume II, Fourth Edition*. Athena Scientific.

*SSRN Working Paper*3440147.

*Journal of Finance*72 (1): 5–46.

*American Economic Review*109 (3): 1116–54.

*arXiv Preprint*, no. 1007.0085.

*Decision Support Systems*50 (3): 602–13.

*Journal of Machine Learning Research*13 (Apr): 1063–95.

*Journal of Machine Learning Research*9 (Sep): 2015–33.

*Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models*. CRC Press.

*SSRN Working Paper*3745078.

*Financial Analysts Journal*48 (5): 28–43.

*Journal of Investing*28 (4): 95–103.

*Journal of Portfolio Management*46 (3): 42–48.

*Machine Learning*35 (3): 193–205.

*European Journal of Operational Research*229 (3): 637–44.

*Hands-on Machine Learning with r*. Chapman & Hall / CRC.

*Review of Financial Studies*Forthcoming.

*Annals of Operations Research*, 1–39.

*Proceedings of the 2008 SIAM International Conference on Data Mining*, 243–54.

*arXiv Preprint*, no. 2110.01889.

*Proceedings of the Fifth Annual Workshop on Computational Learning Theory*, 144–52. ACM.

*Journal of Finance*74 (2): 639–74.

*Convex Optimization*. Cambridge University Press.

*Review of Financial Studies*22 (9): 3411–47.

*Decision Sciences*18 (3): 415–29.

*Machine Learning*24 (1): 49–64.

*Machine Learning*45 (1): 5–32.

*Annals of Statistics*32 (1): 1–11.

*Classification and Regression Trees*. Chapman & Hall.

*Journal of Financial Research*Forthcoming.

*Research Affiliates (November)*.

*Annals of Applied Statistics*9 (1): 247–74.

*Proceedings of the National Academy of Sciences*106 (30): 12267–72.

*Expert Systems with Applications*39 (3): 3446–53.

*SSRN Working Paper*3473874.

*SSRN Working Paper*3481736.

*SSRN Working Paper*3493458.

*Quantitative Finance*19 (8): 1271–91.

*SSRN Working Paper*3657366.

*Annals of Statistics*42 (6): 2526–56.

*Neural Computing & Applications*6 (4): 193–200.

*Expert Systems with Applications*Forthcoming.

*Journal of Investing*Forthcoming.

*Social Responsibility Journal*Forthcoming.

*Journal of Financial Economics*81 (1): 27–60.

*SSRN Working Paper*3706532.

*IEEE Transactions on Neural Networks*14 (6): 1506–18.

*Journal of Finance*52 (1): 57–82.

*Journal of Finance*59 (6): 2577–2603.

*SSRN Working Paper*3435141.

*SSRN Working Paper*2524547.

*Studies in Economics and Finance*.

*ACM Computing Surveys (CSUR)*41 (3): 15.

*ACM Transactions on Intelligent Systems and Technology (TIST)*2 (3): 27.

*arXiv Preprint*, no. 2003.06497.

*arXiv Preprint*, no. 2003.10014.

*Scientific Reports*8 (1): 6085.

*SSRN Working Paper*3448637.

*SSRN Working Paper*3272572.

*SSRN Working Paper*3254995.

*Journal of Finance*Forthcoming.

*SSRN Working Paper*3073681.

*Review of Asset Pricing Studies*Forthcoming.

*Critical Finance Review*Forthcoming.

*INFORMS Journal on Computing*13 (4): 312–31.

*arXiv Preprint*, no. 1808.02610.

*2016 7th International Conference on Cloud Computing and Big Data (CCBD)*, 87–92. IEEE.

*Management Science*58 (10): 1834–53.

*SSRN Working Paper*3350138.

*Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 785–94. ACM.

*Expert Systems with Applications*80: 340–55.

*Journal of Empirical Finance*Forthcoming.

*Journal of Empirical Finance*60: 56–73.

*SSRN Working Paper*3478223.

*Journal of Business & Economic Statistics*38 (4): 771–83.

*Journal of Finance*75 (1): 551–77.

*SSRN Working Paper*3478223.

*Journal of Finance*74 (1): 449–92.

*Journal of Financial Economics*Forthcoming.

*SSRN Working Paper*3487624.

*Annals of Applied Statistics*4 (1): 266–98.

*Journal of Financial and Quantitative Analysis*49 (1): 107–30.

*Deep Learning with Python*. Manning Publications Company.

*Review of Financial Studies*33 (5): 2134–79.

*SSRN Working Paper*2549578.

*Applied Economics*34 (13): 1671–77.

*International Conference on Machine Learning*, 2067–75.

*Model Selection and Model Averaging*. Cambridge University Press.

*International Economic Review*50 (2): 363–95.

*SSRN Working Paper*3362495.

*Asset Pricing: Revised Edition*. Princeton University Press.

*Journal of Finance*66 (4): 1047–1108.

*SSRN Working Paper*3449822.

*SSRN Working Paper*3307057.

*SSRN Working Paper*2800590.

*Journal of Financial Economics*21 (2): 255–89.

*The Journal of Finance*48 (4): 1263–91.

*Long Memory in Economics*, 289–309. Springer.

*Journal of Financial and Quantitative Analysis*54 (5): 1975–2016.

*Annals of Finance*11 (2): 221–41.

*Expert Systems with Applications*73: 69–81.

*Quantitative Finance*Forthcoming.

*Annals of Operations Research*288: 181–221.

*Journal of Portfolio Management*.

*SSRN Working Paper*3779481.

*Apprentissage Artificiel: Deep Learning, Concepts Et Algorithmes*. Eyrolles.

*Machine Learning*20 (3): 273–97.

*Journal of NeuroTechnology*1 (1).

*Mathematical Finance*1 (1): 1–29.

*IEEE Transactions on Information Theory*42 (2): 348–63.

*Journal of Machine Learning Research*7 (Mar): 551–85.

*Review of Financial Studies*29 (3): 739–86.

*Journal of Financial Economics*117 (2): 333–49.

*Theory of Probability & Its Applications*60 (4): 561–79.

*Mathematics of Control, Signals and Systems*2 (4): 303–14.

*arXiv Preprint*, no. 2111.05072.

*Quantitative Finance*11 (3): 351–64.

*Journal of Financial Economics*106 (1): 157–81.

*Journal of Financial and Quantitative Analysis*55 (4): 1163–98.

*Journal of Finance*56 (3): 921–65.

*Review of Financial Studies*33 (4): 1673–1736.

*Journal of Financial Economics*122 (2): 221–47.

*Review of Financial Studies*33 (5): 1927–79.

*Journal of Finance*52 (1): 1–33.

*Critical Finance Review*1 (1): 103–39.

*Journal of Finance*56 (2): 743–66.

*Journal of Banking & Finance*61: S235–40.

*SSRN Working Paper*3557957.

*Advances in Financial Machine Learning*. John Wiley & Sons.

*Mathematische Annalen*300 (1): 463–520.

*Journal of Econometrics*Forthcoming.

*Management Science*55 (5): 798–812.

*Review of Financial Studies*22 (5): 1915–53.

*SSRN Working Paper*3392875.

*Review of Financial Studies*33 (5): 2180–2222.

*Journal of Financial and Quantitative Analysis*50 (6): 1443–71.

*International Conference on Machine Learning*, 665–73.

*SSRN Working Paper*3823328.

*Financial Analysts Journal*75 (4): 84–102.

*Journal of Forecasting*Forthcoming.

*European Financial Management*Forthcoming.

*International Journal of Machine Learning and Computing*7 (5): 118–22.

*SSRN Working Paper*, no. 3572181.

*Machine Learning in Finance: From Theory to Practice*. Springer.

*Journal of Forecasting*15 (1): 49–61.

*International Conference on Machine Learning*, 97:107–15.

*Advances in Neural Information Processing Systems*, 155–61.

*Journal of Financial Data Science*Forthcoming.

*Neural Networks and Statistical Learning*. Springer Science & Business Media.

*arXiv Preprint*2109.14545.

*Journal of Machine Learning Research*12 (Jul): 2121–59.

*Journal of Asset Management*14 (1): 52–71.

*Quarterly Journal of Business and Economics*, 33–48.

*Proceedings of the Conference on Neural Information Processing Systems*.

*Journal of Finance*Forthcoming.

*arXiv Preprint*, no. 1906.06711.

*Cognitive Science*14 (2): 179–211.

*Structural Equation Modeling*8 (1): 128–41.

*Applied Missing Data Analysis*. Guilford Press.

*Journal of Finance*73 (5): 1971–2001.

*Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*, 10792–801.

*Econometrica*, 987–1007.

*Expert Systems with Applications*29 (4): 927–40.

*Journal of Portfolio Management*46 (3): 1–4.

*Journal of Portfolio Management*45 (1): 141–47.

*SSRN Working Paper*3845928.

*Journal of Investment Strategies*4 (4).

*Journal of Finance*47 (2): 427–65.

*Journal of Financial Economics*33 (1): 3–56.

*Journal of Financial Economics*116 (1): 1–22.

*Journal of Financial Economics*128 (2): 234–52.

*Journal of Political Economy*81 (3): 607–36.

*SSRN Working Paper*3844484.

*SSRN Working Paper*3152386.

*Computational Management Science*12 (3): 417–34.

*Journal of Finance*75 (3): 1327–70.

*Journal of Econometrics*Forthcoming.

*SSRN Working Paper*3243683.

*arXiv Preprint*, no. 2011.04391.

*European Journal of Operational Research*270 (2): 654–69.

*Journal of Machine Learning Research*20 (177): 1–81.

*Applied Intelligence*, 1–19.

*arXiv Preprint*, no. 2002.07477.

*arXiv Preprint*, no. 1807.02811.

*Journal of Financial Economics*111 (1): 1–25.

*Journal of Accounting Research*, 185–209.

*Machine Learning: Proceedings of the Thirteenth International Conference*, 96:148–56.

*Journal of Computer and System Sciences*55 (1): 119–39.

*Review of Financial Studies*33 (5): 2326–77.

*Journal of Sustainable Finance & Investment*5 (4): 210–33.

*Annals of Statistics*, 1189–1232.

*Computational Statistics & Data Analysis*38 (4): 367–78.

*Annals of Applied Statistics*2 (3): 916–54.

*Annals of Statistics*28 (2): 337–407.

*Biostatistics*9 (3): 432–41.

*Machine Learning*29 (2-3): 131–63.

*Journal of Financial and Quantitative Analysis*21 (3): 293–305.

*arXiv Preprint*, no. 1806.01743.

*Decision Analysis*14 (1): 1–20.

*Econometrica*84 (3): 985–1046.

*SSRN Working Paper*3443426.

*Journal of Banking & Finance*32 (12): 2646–54.

*arXiv Preprint*, no. 1611.04561.

*European Financial Management Association Conference Working Paper*.

*Expert Systems with Applications*129: 27–36.

*Neurocomputing*72 (7-9): 1483–93.

*SSRN Working Paper*3774548.

*Bayesian Data Analysis, 3rd Edition*. Chapman & Hall / CRC.

*Neural Computation*4 (1): 1–58.

*International Journal of Forecasting*29 (1): 108–21.

*Journal of Economic Literature*57 (3): 535–74.

*Computational Statistics & Data Analysis*50 (11): 3113–23.

*arXiv Preprint*, no. 2107.06277.

*SSRN Working Paper*3525530.

*SSRN Working Paper*2865922.

*Journal of Political Economy*111 (4): 693–732.

*Journal of Banking & Finance*50: 169–82.

*Oxford Research Encyclopedia of Economics and Finance*.

*Deep Learning*. MIT Press Cambridge.

*Advances in Neural Information Processing Systems*, 2672–80.

*arXiv Preprint*, no. 2203.05556.

*Journal of Financial Economics*132 (2): 451–71.

*Journal of Econometrics*Forthcoming.

*Journal of Financial and Quantitative Analysis*50 (6): 1415–41.

*Financial Markets and Portfolio Management*Forthcoming.

*Biometrics*, 857–71.

*Financial Markets and Portfolio Management*26 (1): 3–38.

*Journal of Financial and Quantitative Analysis*50 (6): 1237–67.

*Econometrica*, 424–38.

*Review of Accounting Studies*18 (3): 692–730.

*Review of Financial Studies*30 (12): 4389–4436.

*Econometric Analysis, Eighth Edition*. Pearson Education.

*R Journal*9 (1): 421–36.

*R Journal*.

*Journal of Financial Economics*78 (2): 311–39.

*Management Science*63 (4): 1110–30.

*Journal of Econometrics*Forthcoming.

*Review of Financial Studies*33 (5): 2223–73.

*Big Data and Machine Learning in Quantitative Investment*, 129–48. Wiley.

*Wilmott*2018 (98): 24–33.

*Journal of Financial and Quantitative Analysis*51 (4): 1297–1323.

*Neural Networks*98: 296–304.

*SSRN Working Paper*3683288.

*IEEE Transactions on Knowledge and Data Engineering*26 (9): 2250–67.

*Journal of Portfolio Management*45 (3): 13–36.

*Expert Systems with Applications*38 (8): 10389–97.

*Journal of Machine Learning Research*3 (Mar): 1157–82.

*Review of Financial Studies*33 (5): 1980–2018.

*arXiv Preprint*, no. 1706.09523.

*An Introduction to Machine Learning Interpretability - Second Edition*. O’Reilly.

*Annals of Statistics*36 (5): 2135–52.

*arXiv Preprint*, no. 1805.06126.

*SSRN Working Paper*3185335.

*Advances in Neural Information Processing Systems*, 571–81.

*Econometrica*, 1029–54.

*IEEE Transactions on Evolutionary Computation*1 (1): 40–52.

*Journal of Finance*74 (5): 2153–99.

*Journal of Finance*72 (4): 1399–1440.

*Critical Finance Review*, 1–9.

*Quantitative Finance*10 (5): 469–85.

*Journal of Portfolio Management*42 (1): 13–28.

*SSRN Working Paper*3341728.

*Journal of Finance*Forthcoming.

*SSRN Working Paper*3865813.

*Review of Asset Pricing Studies*10 (2): 199–248.

*Review of Financial Studies*29 (1): 5–68.

*SSRN Working Paper*2528780.

*Journal of Financial Economics*132 (3): 182–204.

*Expert Systems with Applications*33 (1): 171–80.

*arXiv Preprint*, no. 2006.00371.

*The Elements of Statistical Learning*. Springer.

*Journal of Financial Economics*41 (3): 401–39.

*Neural Networks and Learning Machines*. Prentice Hall.

*Foundations and Trends in Optimization*2 (3-4): 157–325.

*Machine Learning*69 (2-3): 169–92.

*SSRN Working Paper*3143752.

*SSRN Working Paper*3949463.

*PLoS Biology*13 (3): e1002106.

*Journal of Causal Inference*6 (2).

*Journal of Financial Economics*99 (3): 560–80.

*Expert Systems with Applications*124: 226–51.

*Journal of Finance*49 (5): 1639–64.

*Journal of Business Ethics*70 (2): 165–74.

*Journal of Financial and Quantitative Analysis*46 (3): 815–39.

*Journal of Banking & Finance*36 (5): 1392–1401.

*Proceedings of 3rd International Conference on Document Analysis and Recognition*, 1:278–82. IEEE.

*Journal of Optimization Theory and Applications*115 (3): 549–70.

*Neural Computation*9 (8): 1735–80.

*Artificial Intelligence Review*22 (2): 85–126.

*Journal of Portfolio Management*44 (1): 30–43.

*SSRN Working Paper*3190310.

*arXiv Preprint*, no. 1802.02871.

*American Journal of Political Science*54 (2): 561–81.

*Review of Financial Studies*33 (3): 1011–23.

*Journal of Econometrics*208 (1): 265–81.

*arXiv Preprint*, no. 1902.06021.

*Management Science*.

*Expert Systems with Applications*129: 273–85.

*Review of Financial Studies*28 (3): 650–705.

*Review of Financial Studies*33 (5): 2019–2133.

*Journal of Banking & Finance*97: 257–69.

*SSRN Working Paper*3678363.

*SSRN Working Paper*3622753.

*Computers & Operations Research*32 (10): 2513–22.

*Review of Finance*9 (3): 415–35.

*European Journal of Operational Research*278 (1): 330–42.

*arXiv Preprint*, no. 1912.09104.

*Journal of Financial Econometrics*Forthcoming.

*Journal of Machine Learning Research*11 (5).

*Expected Returns: An Investor’s Guide to Harvesting Market Rewards*. John Wiley & Sons.

*SSRN Working Paper*3400998.

*Journal of Portfolio Management*47 (2): 38–62.

*Journal of Financial Economics*135 (1): 213–30.

*Neural Computation*3 (1): 79–87.

*Journal of Finance*58 (4): 1651–83.

*Journal of Finance*53 (4): 1285–1309.

*An Introduction to Statistical Learning*. Vol. 112. Springer.

*Journal of Financial Economics*133 (2): 273–98.

*Journal of Finance*48 (1): 65–91.

*Journal of Finance*23 (2): 389–416.

*Big Data and Machine Learning in Quantitative Investment*, 51–74. Wiley.

*SSRN Working Paper*3756587.

*arXiv Preprint*, no. 2003.01859.

*arXiv Preprint*, no. 1706.10059.

*SSRN Working Paper*, no. 3492142.

*SSRN Working Paper*3622743.

*Journal of Finance*57 (2): 585–608.

*Review of Asset Pricing Studies*9 (1): 1–46.

*Advances in Psychology*, 121:471–95.

*Journal of Business*, 259–78.

*Factor Investing: From Traditional to Alternative Risk Premia*. Elsevier.

*SSRN Working Paper*3955838.

*Journal of Statistical Software*47 (11): 1–26.

*Journal of Financial and Quantitative Analysis*42 (3): 621–56.

*SSRN Working Paper*3803954.

*arXiv Preprint*, no. 2006.05574.

*Advances in Neural Information Processing Systems*, 3146–54.

*SSRN Working Paper*3388293.

*High Frequency Trading: New Realities for Traders, Markets, and Regulators*.

*Journal of Financial Economics*134 (3): 501–24.

*European Financial Management*13 (5): 908–22.

*Expert Systems with Applications*Forthcoming: 113546.

*Neurocomputing*55 (1-2): 307–19.

*SSRN Working Paper*3263001.

*Journal of Banking & Finance*45: 1–8.

*1990 IJCNN International Joint Conference on Neural Networks*, 1–6. IEEE.

*arXiv Preprint*, no. 1412.6980.

*SSRN Working Paper*3520131.

*Journal of Political Economy*127 (4): 1475–515.

*SSRN Working Paper*3378340.

*Journal of Financial Data Science*1 (1): 159–71.

*Journal of Machine Learning in Finance*1 (1).

*Proceedings of the ICLR Conference*, 1–25.

*arXiv Preprint*, no. 2004.03445.

*Journal of Finance*73 (3): 1183–223.

*Journal of Financial Economics*135: 271–92.

*European Journal of Operational Research*259 (2): 689–702.

*Journal of Banking & Finance*, 105687.

*Journal of Risk and Financial Management*12 (1): 47.

*Doing Bayesian Data Analysis: A Tutorial with r, JAGS, and Stan (2nd Ed.)*. Academic Press.

*Feature Engineering and Selection: A Practical Approach for Predictive Models*. CRC Press.

*Journal of Investing*29 (2): 21–32.

*Annals of Applied Statistics*5 (2A): 798–823.

*Journal of Finance*49 (5): 1541–78.

*Review of Financial Studies*24 (10): 3197–3249.

*Journal of Empirical Finance*15 (5): 850–59.

*Journal of Multivariate Analysis*88 (2): 365–411.

*Review of Financial Studies*30 (12): 4349–88.

*Journal of Financial Econometrics*17 (4): 645–86.

*Journal of Econometrics*Forthcoming.

*arXiv Preprint*, no. 2001.10278.

*Nouvelles méthodes Pour La détermination Des Orbites Des Comètes*. F. Didot.

*SSRN Working Paper*3410972.

*SSRN Working Paper*.

*arXiv Preprint*, no. 1404.3274.

*Journal of Econometrics*Forthcoming.

*Review of Financial Studies*33 (5): 2274–2325.

*European Journal of Operational Research*134 (1): 84–102.

*ACM Computing Surveys (CSUR)*46 (3): 35.

*Online Portfolio Selection: Principles and Algorithms*. CRC Press.

*SSRN Working Paper*3536461.

*SSRN Working Paper*3688484.

*Philosophical Transactions of the Royal Society A*379 (2194): 20200209.

*Review of Financial Studies*31 (7): 2606–49.

*Review of Economics and Statistics*47 (1): 13–37.

*SSRN Working Paper*3272090.

*SSRN Working Paper*3531946.

*Statistical Analysis with Missing Data*. Vol. 333. John Wiley & Sons.

*Journal of Forecasting*Forthcoming.

*Review of Financial Studies*3 (2): 175–205.

*American Mathematical Monthly*Forthcoming.

*Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence*, 1280–86. AAAI Press.

*Journal of Accounting Research*54 (4): 1187–1230.

*Advances in Neural Information Processing Systems*, 4765–74.

*Review of Financial Studies*Forthcoming.

*Journal of Business & Economic Statistics*38 (1): 214–27.

*Expert Systems with Applications*Forthcoming: 113973.

*Handbook of Graphical Models*. CRC Press.

*International Conference on Machine Learning*, 2113–22.

*Journal of Portfolio Management*36 (4): 60–70.

*European Journal of Operational Research*244 (1): 289–99.

*R Data Science Quick Reference*, 71–81. Springer.

*Journal of Finance*7 (1): 77–91.

*arXiv Preprint*, no. 1910.09504.

*arXiv Preprint*, no. 1909.10678.

*SSRN Working Paper*3511296.

*International Conference on Machine Learning*, 1614–23.

*Journal of Forecasting*Forthcoming.

*Advances in Neural Information Processing Systems*, 512–18.

*Practical Neural Network Recipes in C++*. Morgan Kaufmann.

*Journal of Forecasting*31 (2): 172–88.

*SSRN Working Paper*3638177.

*Journal of Finance*71 (1): 5–32.

*Data*4 (3): 110.

*Journal of the American Statistical Association*44 (247): 335–41.

*Matrix Analysis and Applied Linear Algebra*. Vol. 71. SIAM.

*Finance Research Letters*Forthcoming.

*arXiv Preprint*, no. 2102.05799.

*Foundations of Machine Learning*. MIT Press.

*Interpretable Machine Learning: A Guide for Making Black Box Models Explainable*. LeanPub / Lulu.

*Journal of Open Source Software*3 (27): 786.

*Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr)*, 300–307. IEEE.

*Journal of Forecasting*17 (5-6): 441–70.

*SSRN Working Paper*2740751.

*arXiv Preprint*, no. 2004.01509.

*Journal of Finance*54 (4): 1249–90.

*Journal of Financial Economics*104 (2): 228–50.

*Econometrica: Journal of the Econometric Society*34 (4): 768–83.

*The Journal of Investing*25 (2): 113–24.

*Doklady AN USSR*, 269:543–47.

*Advances in Neural Information Processing Systems*, 952–58.

*Advances in Neural Information Processing Systems*, 936–42.

*Decision Support Systems*50 (3): 559–69.

*arXiv Preprint*, no. 2006.05421.

*Journal of Financial Economics*103 (3): 429–53.

*Review of Financial Studies*29 (1): 104–47.

*arXiv Preprint*, no. 1912.05901.

*Journal of Financial Data Science*Forthcoming.

*Contemporary Accounting Research*11 (2): 661–87.

*Ensembles in Machine Learning Applications*. Vol. 373. Springer Science & Business Media.

*Social Studies of Science*26 (3): 611–59.

*arXiv Preprint*, no. 1708.05070.

*Expert Systems with Applications*, 112828.

*arXiv Preprint*, no. 1810.12282.

*IEEE Transactions on Knowledge and Data Engineering*22 (10): 1345–59.

*Expert Systems with Applications*42 (1): 259–68.

*Expert Systems with Applications*42 (4): 2162–72.

*Journal of Financial Economics*98 (3): 605–25.

*Journal of Financial Economics*Forthcoming.

*Causality: Models, Reasoning and Inference. Second Edition*. Vol. 29. Cambridge University Press.

*SSRN Working Paper*3530390.

*Management Science*Forthcoming.

*Expert Systems with Applications*103: 1–13.

*Journal of Econometrics*Forthcoming.

*Journal of Econometrics*Forthcoming.

*SSRN Working Paper*3425827.

*Journal of Business & Economic Statistics*29 (2): 307–18.

*SSRN Working Paper*3807010.

*Elements of Causal Inference: Foundations and Learning Algorithms*. MIT Press.

*Review of Financial Studies*22 (1): 435–80.

*Journal of Banking & Finance*36 (2): 410–17.

*SSRN Working Paper*3691117.

*SSRN Working Paper*1787045.

*USSR Computational Mathematics and Mathematical Physics*4 (5): 1–17.

*arXiv Preprint*, no. 1909.06312.

*Journal of Control Theory and Applications*9 (3): 336–52.

*Journal of Financial Data Science*2 (1): 86–93.

*arXiv Preprint*, no. 1802.09596.

*Review of Financial Studies*32 (4): 1573–1607.

*Dataset Shift in Machine Learning*. MIT Press.

*Journal of Finance*68 (4): 1633–62.

*SSRN Working Paper*3428095.

*AISTATS*, 489–97.

*Decision Support Systems*50 (2): 491–500.

*Theoretical Economics*Forthcoming.

*Computational Economics*40 (3): 245–64.

*Reproducible Finance with r: Code Flows and Shiny Apps for Portfolio Analysis*. Chapman & Hall / CRC.

*Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*, 1135–44. ACM.

*AISTATS*.

*Pattern Recognition and Neural Networks*. Cambridge University Press.

*Stochastic Processes and Their Applications*49 (2): 207–16.

*Journal of New Finance*2 (2): 2.

*Econometrica*73 (4): 1237–82.

*Journal of Empirical Finance*23: 93–116.

*Psychological Review*65 (6): 386.

*Journal of Economic Theory*13 (3): 341–60.

*Robust Regression and Outlier Detection*. Vol. 589. Wiley.

*arXiv Preprint*, no. 1911.05620.

*arXiv Preprint*, no. 2008.12152.

*SSRN Working Paper*2986059.

*arXiv Preprint*, no. 1904.04973.

*Statistical Methods in Medical Research*8 (1): 3–15.

*Machine Learning*5 (2): 197–227.

*Nonlinear Estimation and Classification*, 149–71. Springer.

*Boosting: Foundations and Algorithms*. MIT Press.

*Annals of Statistics*43 (4): 1716–41.

*Synthesis Lectures on Data Mining and Knowledge Discovery*2 (1): 1–126.

*Synthesis Lectures on Artificial Intelligence and Machine Learning*6 (1): 1–114.

*arXiv Preprint*, no. 1911.13288.

*American Journal of Epidemiology*179 (6): 764–74.

*Journal of Asset Management*21 (6): 506–12.

*European Journal of Finance*Forthcoming: 1–27.

*Review of Financial Studies*5 (1): 1–33.

*Contributions to the Theory of Games*2 (28): 307–17.

*Journal of Finance*19 (3): 425–42.

*Journal of Business*39 (1): 119–38.

*Risk & Reward*, 14–19.

*SSRN Working Paper*3898282.

*arXiv Preprint*, no. 2106.03253.

*Nature*529: 484–89.

*Journal of Financial Data Science*1 (1): 32–44.

*Journal of Experimental Psychology: General*143 (2): 534.

*Quantitative Finance*19 (9): 1449–59.

*arXiv Preprint*, no. 1803.09820.

*SSRN Working Paper*3728192.

*Advances in Neural Information Processing Systems*, 2951–59.

*Journal of Financial Data Science*Forthcoming.

*Expert Systems with Applications*Forthcoming: 113456.

*Causation, Prediction, and Search*. MIT Press.

*Journal of Machine Learning Research*15 (1): 1929–58.

*Amundi Working Paper*.

*Journal of Financial Economics*54 (3): 375–421.

*arXiv Preprint*, no. 1804.01955.

*Bioinformatics*28 (1): 112–18.

*Journal of Finance*53 (5): 1821–27.

*Journal of Portfolio Management*43 (2): 90–104.

*Journal of Econometrics*Forthcoming.

*Reinforcement Learning: An Introduction (2nd Edition)*. MIT Press.

*arXiv Preprint*, no. 2010.14194.

*International Journal of Epidemiology*45 (6): 1887–94.

*arXiv Preprint*, no. 2004.06627.

*Journal of the Royal Statistical Society. Series B (Methodological)*, 267–88.

*Annals of Statistics*, 1701–28.

*Journal of Statistical Software*76 (1): 1–30.

*Annual Review of Financial Economics*10: 449–79.

*IEEE Transactions on Knowledge & Data Engineering*, no. 3: 659–65.

*Journal of Financial Markets*Forthcoming: 100588.

*Harvard Business Review*43 (1): 63–75.

*2017 IEEE 19th Conference on Business Informatics (CBI)*, 1:7–12.

*Journal of Empirical Finance*Forthcoming.

*Journal of Financial and Quantitative Analysis*45 (4): 959–86.

*Econometrics Journal*22 (1): 34–56.

*Flexible Imputation of Missing Data*. Chapman & Hall / CRC.

*Journal of Banking & Finance*35 (12): 3263–74.

*Automation and Remote Control*24: 774–80.

*Review of Financial Studies*26 (5): 1087–1145.

*arXiv Preprint*, no. 2003.11132.

*Journal of Financial Markets*Forthcoming: 100598.

*SSRN Working Paper*3924346.

*Omega*15 (2): 145–55.

*arXiv Preprint*, no. 2001.04185.

*Organizational Behavior and Human Performance*8 (1): 139–58.

*arXiv Preprint*, no. 2003.00130.

*Expert Systems with Applications*38 (1): 223–30.

*SSRN Working Paper*3382932.

*Omega*40 (6): 758–66.

*Expert Systems with Applications*143: 113042.

*arXiv Preprint*, no. 2205.04216.

*Machine Learning*8 (3-4): 279–92.

*arXiv Preprint*, no. 1910.03743.

*arXiv Preprint*2101.10942.

*Journal of Big Data*3 (1): 9.

*Econometrica*68 (5): 1097–1126.

*Journal of Open Source Software*4 (43): 1686.

*IRE WESCON Convention Record*, 4:96–104.

*Quantitative Finance*Forthcoming.

*Complex Systems*6 (1): 47.

*Neural Networks*5 (2): 241–59.

*IEEE Transactions on Evolutionary Computation*1 (1): 67–82.

*arXiv Preprint*, no. 2003.02515.

*Management Science*Forthcoming.

*arXiv Preprint*, no. 1811.07522.

*Review of Financial Studies*Forthcoming.

*SSRN Working Paper*3443998.

*SSRN Working Paper*3517888.

*Expert Systems with Applications*114: 388–401.

*North American Journal of Economics and Finance*, 101274.

*arXiv Preprint*, no. 1901.08740.

*Journal of Banking & Finance*Forthcoming: 105966.

*arXiv Preprint*, no. 1212.5701.

*Ensemble Machine Learning: Methods and Applications*. Springer.

*Expert Systems with Applications*36 (5): 8849–54.

*Journal of Financial Data Science*2 (2): 25–40.

*SSRN Working Paper*3991393.

*Journal of Business & Economic Statistics*Forthcoming.

*Ensemble Methods: Foundations and Algorithms*. Chapman & Hall / CRC.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*67 (2): 301–20.

*The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution*. Penguin Random House.

For a list of online resources, we recommend the curated page https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md.↩︎

One example: https://www.bioconductor.org/packages/release/bioc/html/Rgraphviz.html↩︎

By copy-pasting the content of the package in the library folder. To get the address of the folder, execute the command

*.libPaths()*in the R console.↩︎We refer to for a list of alternative data providers. Moreover, we recall that Quandl, an alt-data hub was acquired by Nasdaq in December 2018. As large players acquire newcomers, the field may consolidate.↩︎

Other approaches are nonetheless possible, as is advocated in Prado and Fabozzi (2020).↩︎

This has been a puzzle for the value factor during the 2010 decade during which the factor performed poorly (see Bellone et al. (2020) Cornell and Damodaran (2021) and Stagnol et al. (2021)). Shea and Radatz (2020) argue that it is because some fundamentals of value firms (like ROE) have not improved at the rate of those of growth firms. This underlines that it is hard to pick which fundamental metrics matter and that their importance varies with time. Binz, Schipper, and Standridge (2020) even find that resorting to AI to make sense (and mine) the fundamentals’ zoo only helps marginally.↩︎

Originally, Fama and MacBeth (1973) work with the market beta only: \(r_{t,n}=\alpha_n+\beta_nr_{t,M}+\epsilon_{t,n}\) and the second pass included nonlinear terms: \(r_{t,n}=\gamma_{n,0}+\gamma_{t,1}\hat{\beta}_{n}+\gamma_{t,2}\hat{\beta}^2_n+\gamma_{t,3}\hat{s}_n+\eta_{t,n}\), where the \(\hat{s}_n\) are risk estimates for the assets that are not related to the betas. It is then possible to perform asset pricing tests to infer some properties. For instance, test whether betas have a linear influence on returns or not (\(\mathbb{E}[\gamma_{t,2}]=0\)), or test the validity of the CAPM (which implies \(\mathbb{E}[\gamma_{t,0}]=0\)).↩︎

Older tests for the number of factors in linear models include Connor and Korajczyk (1993) and Bai and Ng (2002).↩︎

Autocorrelation in aggregate/portfolio returns is a widely documented effect since the seminal paper Lo and MacKinlay (1990) (see also Moskowitz, Ooi, and Pedersen (2012)).↩︎

In the same spirit, see also Lettau and Pelger (2020a) and Lettau and Pelger (2020b).↩︎

Some methodologies do map firm attributes into final weights, e.g., Brandt, Santa-Clara, and Valkanov (2009) and Ammann, Coqueret, and Schade (2016), but these are outside the scope of the book.↩︎

This is of course not the case for inference relying on linear models. Memory generates many problems and complicates the study of estimators. We refer to Hjalmarsson (2011) and Xu (2020) for theoretical and empirical results on this matter.↩︎

For a more thorough technical discussion on the impact of feature engineering, we refer to Galili and Meilijson (2016).↩︎

See www.kaggle.com. ↩︎

The strength is measured as the average margin, i.e. the average of \(mg\) when there is only one tree.↩︎

See, e.g., http://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.randomForest.html↩︎

The Real Adaboost of J. Friedman et al. (2000) has a different output: the probability of belonging to a particular class.↩︎

In case of package conflicts, use keras::get_weights(model). Indeed, another package in the machine learning landscape,

*yardstick*, uses the function name “get_weights”.↩︎This assumption can be relaxed, but the algorithms then become more complex and are out of the scope of the current book. One such example that generalizes the naive Bayes approach is N. Friedman, Geiger, and Goldszmidt (1997).↩︎

In the case of BARTs, the training consists exactly in the drawing of posterior samples.↩︎

There are some exceptions, like attempts to optimize more exotic criteria, such as the Spearman rho, which is based on rankings and is close in spirit to maximizing the correlation between the output and the prediction. Because this rho cannot be differentiated, this causes numerical issues. These problems can be partially alleviated when resorting to complex architectures, as in Engilberge et al. (2019). ↩︎

Another angle, critical of neural networks is provided in Geman, Bienenstock, and Doursat (1992).↩︎

Constraints often have beneficial effects on portfolio composition, see Jagannathan and Ma (2003) and DeMiguel et al. (2009).↩︎

A long position in an asset with positive return or a short position in an asset with negative return.↩︎

We invite the reader to have a look at the thoughtful albeit theoretical paper by Arjovsky et al. (2019).↩︎

In the thread https://twitter.com/fchollet/status/1177633367472259072, François Chollet, the creator of Keras argues that ML predictions based on price data cannot be profitable in the long term. Given the wide access to financial data, it is likely that the statement holds for predictions stemming from factor-related data as well.↩︎

For instance, we do not mention the work of Horel and Giesecke (2019) but the interested reader can have a look at their work on neural networks (and also at the references cited in the paper).↩︎

The CAM package was removed from CRAN in November 2019 but can still be installed manually. First, download the content of the package: https://cran.r-project.org/web/packages/CAM/index.html. Second, copy it in the directory obtained by typing

*.libPaths()*in the console.↩︎Another possible choice is the

*baycn*package documented in E. A. Martin and Fu (2019).↩︎See for instance the papers on herding in factor investing: Krkoska and Schenk-Hoppé (2019) and Santi and Zwinkels (2018).↩︎

This book is probably the most complete reference for theoretical results in machine learning, but it is in French.↩︎

In practice, this is not a major problem; since we work with features that are uniformly distributed, de-meaning amounts to remove 0.5 to all feature values.↩︎

Like neural networks, reinforcement learning methods have also been recently developed for derivatives pricing and hedging, see for instance Kolm and Ritter (2019a) and J. Du et al. (2020).↩︎

e.g., Sharpe ratio which is for instance used in Moody et al. (1998), Bertoluzzo and Corazza (2012) and Aboussalah and Lee (2020) or drawdown-based ratios, as in Almahdi and Yang (2017).↩︎

Some recent papers consider arbitrary weights (e.g., Z. Jiang, Xu, and Liang (2017) and Yu et al. (2019)) for a limited number of assets.↩︎