Runs the EM algorithm over all possible group aggregating, returning the one with higher likelihood while constraining the standard deviation of the probabilities.

This function estimates the voting probabilities (computed using run_em) by trying all group aggregations (of adjacent groups), choosing the one that achieves the higher likelihood as long as the standard deviation (computed using bootstrap) of the estimated probabilities is below a given threshold. See Details for more informacion on adjacent groups.

Usage

get_agg_opt(
  object = NULL,
  X = NULL,
  W = NULL,
  json_path = NULL,
  sd_statistic = "maximum",
  sd_threshold = 0.05,
  method = "mult",
  nboot = 100,
  allow_mismatch = TRUE,
  seed = NULL,
  ...
)

Arguments

object

An object of class eim, which can be created using the eim function. This parameter should not be used if either (i) X and W matrices or (ii) json_path is supplied. See Note in run_em.

X

A (b x c) matrix representing candidate votes per ballot box.

W

A (b x g) matrix representing group votes per ballot box.

json_path

A path to a JSON file containing X and W fields, stored as nested arrays. It may contain additional fields with other attributes, which will be added to the returned object.

sd_statistic

String indicates the statistic for the standard deviation (g x c) matrix for the stopping condition, i.e., the algorithm stops when the statistic is below the threshold. It can take the value maximum, in which case computes the maximum over the standard deviation matrix, or average, in which case computes the average.

sd_threshold

Numeric with the value to use as a threshold for the statistic (sc_statistic) of the standard deviation of the estimated probabilities. Defaults to 0.05.

method

An optional string specifying the method used for estimating the E-step. Valid options are:

mult: The default method, using a single sum of Multinomial distributions.
mvn_cdf: Uses a Multivariate Normal CDF distribution to approximate the conditional probability.
mvn_pdf: Uses a Multivariate Normal PDF distribution to approximate the conditional probability.
mcmc: Uses MCMC to sample vote outcomes. This is used to estimate the conditional probability of the E-step.
exact: Solves the E-step using the Total Probability Law.

nboot

Integer specifying how many times to run the EM algorithm.

allow_mismatch

Boolean, if TRUE, allows a mismatch between the voters and votes for each ballot-box, only works if method is "mvn_cdf", "mvn_pdf", "mult" and "mcmc". If FALSE, throws an error if there is a mismatch. By default it is TRUE.

seed

An optional integer indicating the random seed for the randomized algorithms. This argument is only applicable if initial_prob = "random" or method is either "mcmc" or "mvn_cdf". Aditionally, it sets the random draws of the ballot boxes.

...

Additional arguments passed to the run_em function that will execute the EM algorithm.

Value

It returns an eim object with the same attributes as the output of run_em, plus the attributes:

sd: A (a x c) matrix with the standard deviation of the estimated probabilities computed with bootstrapping. Note that a denotes the number of macro-groups of the resulting group aggregation, it should be between 1 and g.
nboot: Number of samples used for the bootstrap method.
seed: Random seed used (if specified).
sd_statistic: The statistic used as input.
sd_threshold: The threshold used as input.
group_agg: Vector with the resulting group aggregation. See Examples for more details.

Additionally, it will create the W_agg attribute with the aggregated groups, along with the attributes corresponding to running run_em with the aggregated groups.

Details

Groups of consecutive column indices in the matrix W are considered adjacent. For example, consider the following seven groups defined by voters' age ranges: 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, and 80+. A possible group aggregation can be a macro-group composed of the three following age ranges: 20-39, 40-59, and 60+. Since there are multiple group aggregations, the method evaluates all possible group aggregations (merging only adjacent groups).

Examples

# Example 1: Using a simulated instance
simulations <- simulate_election(
    num_ballots = 20,
    num_candidates = 3,
    num_groups = 8,
    seed = 42
)

result <- get_agg_opt(
    X = simulations$X,
    W = simulations$W,
    sd_threshold = 0.05,
    seed = 42
)

result$group_agg # c(3,8)
#> [1] 3 8
# This means that the resulting group aggregation consists of
# two macro-groups: one that includes the original groups 1, 2, and 3;
# the remaining one with groups 4, 5, 6, 7 and 8.

# \donttest{
# Example 2: Getting an unfeasible result
result2 <- get_agg_opt(
    X = simulations$X,
    W = simulations$W,
    sd_threshold = 0.001
)

result2$group_agg # Error
#> NULL
result2$X # Input candidates' vote matrix
#>       [,1] [,2] [,3]
#>  [1,]   24   41   35
#>  [2,]   30   37   33
#>  [3,]   28   41   31
#>  [4,]   21   46   33
#>  [5,]   29   42   29
#>  [6,]   19   51   30
#>  [7,]   21   52   27
#>  [8,]   29   46   25
#>  [9,]   26   47   27
#> [10,]   21   51   28
#> [11,]   55   25   20
#> [12,]   52   26   22
#> [13,]   46   24   30
#> [14,]   23   33   44
#> [15,]   22   34   44
#> [16,]   11   70   19
#> [17,]   16   62   22
#> [18,]   34   52   14
#> [19,]   30   49   21
#> [20,]   42   42   16
result2$W # Input group-level voter matrix
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#>  [1,]   54    6    4    6    5    8   13    4
#>  [2,]   60    4   11    7    8    4    4    2
#>  [3,]   35   27    8    6    9    6    3    6
#>  [4,]    7   64    3    5    6    8    3    4
#>  [5,]    4   56    6    4    7    8    8    7
#>  [6,]    3    8   54    4   11    5    9    6
#>  [7,]    7    8   51    9    7    7    5    6
#>  [8,]    5   11   33   28    8    4    5    6
#>  [9,]    5    4    4   62    5   11    3    6
#> [10,]   13    4   13   47    6    3    6    8
#> [11,]    6    4    4    9   59    6    6    6
#> [12,]    4    7    7    9   56    5    8    4
#> [13,]    5    8    8    7   29   29    5    9
#> [14,]   14    1    7   10    3   61    4    0
#> [15,]    1    6    8    8    6   56    5   10
#> [16,]    6    9    7    3    4    3   62    6
#> [17,]    8    5    3    6    5    9   61    3
#> [18,]    2    5    8    4    7    7   28   39
#> [19,]    4    7    4    6    4    9    6   60
#> [20,]    7    6    7   10    5    1    6   58
# }