Runs the EM algorithm over all possible group aggregating, returning the one with higher likelihood while constraining the standard deviation of the probabilities.
Source:R/eim-class.R
get_agg_opt.Rd
This function estimates the voting probabilities (computed using run_em) by trying all group aggregations (of adjacent groups), choosing the one that achieves the higher likelihood as long as the standard deviation (computed using bootstrap) of the estimated probabilities is below a given threshold. See Details for more informacion on adjacent groups.
Usage
get_agg_opt(
object = NULL,
X = NULL,
W = NULL,
json_path = NULL,
sd_statistic = "maximum",
sd_threshold = 0.05,
method = "mult",
nboot = 100,
allow_mismatch = TRUE,
seed = NULL,
...
)
Arguments
- object
An object of class
eim
, which can be created using the eim function. This parameter should not be used if either (i)X
andW
matrices or (ii)json_path
is supplied. See Note in run_em.- X
A
(b x c)
matrix representing candidate votes per ballot box.- W
A
(b x g)
matrix representing group votes per ballot box.- json_path
A path to a JSON file containing
X
andW
fields, stored as nested arrays. It may contain additional fields with other attributes, which will be added to the returned object.- sd_statistic
String indicates the statistic for the standard deviation
(g x c)
matrix for the stopping condition, i.e., the algorithm stops when the statistic is below the threshold. It can take the valuemaximum
, in which case computes the maximum over the standard deviation matrix, oraverage
, in which case computes the average.- sd_threshold
Numeric with the value to use as a threshold for the statistic (
sc_statistic
) of the standard deviation of the estimated probabilities. Defaults to 0.05.- method
An optional string specifying the method used for estimating the E-step. Valid options are:
mult
: The default method, using a single sum of Multinomial distributions.mvn_cdf
: Uses a Multivariate Normal CDF distribution to approximate the conditional probability.mvn_pdf
: Uses a Multivariate Normal PDF distribution to approximate the conditional probability.mcmc
: Uses MCMC to sample vote outcomes. This is used to estimate the conditional probability of the E-step.exact
: Solves the E-step using the Total Probability Law.
- nboot
Integer specifying how many times to run the EM algorithm.
- allow_mismatch
Boolean, if
TRUE
, allows a mismatch between the voters and votes for each ballot-box, only works ifmethod
is"mvn_cdf"
,"mvn_pdf"
,"mult"
and"mcmc"
. IfFALSE
, throws an error if there is a mismatch. By default it isTRUE
.- seed
An optional integer indicating the random seed for the randomized algorithms. This argument is only applicable if
initial_prob = "random"
ormethod
is either"mcmc"
or"mvn_cdf"
. Aditionally, it sets the random draws of the ballot boxes.- ...
Additional arguments passed to the run_em function that will execute the EM algorithm.
Value
It returns an eim object with the same attributes as the output of run_em, plus the attributes:
sd: A
(a x c)
matrix with the standard deviation of the estimated probabilities computed with bootstrapping. Note thata
denotes the number of macro-groups of the resulting group aggregation, it should be between1
andg
.nboot: Number of samples used for the bootstrap method.
seed: Random seed used (if specified).
sd_statistic: The statistic used as input.
sd_threshold: The threshold used as input.
group_agg: Vector with the resulting group aggregation. See Examples for more details.
Additionally, it will create the W_agg
attribute with the aggregated groups, along with the attributes corresponding to running run_em with the aggregated groups.
Details
Groups of consecutive column indices in the matrix W
are considered adjacent. For example, consider the following seven groups defined by voters' age
ranges: 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, and 80+. A possible group aggregation can be a macro-group composed of the three following age
ranges: 20-39, 40-59, and 60+. Since there are multiple group aggregations, the method evaluates all possible group aggregations (merging only adjacent groups).
Examples
# Example 1: Using a simulated instance
simulations <- simulate_election(
num_ballots = 20,
num_candidates = 3,
num_groups = 8,
seed = 42
)
result <- get_agg_opt(
X = simulations$X,
W = simulations$W,
sd_threshold = 0.05,
seed = 42
)
result$group_agg # c(3,8)
#> [1] 3 8
# This means that the resulting group aggregation consists of
# two macro-groups: one that includes the original groups 1, 2, and 3;
# the remaining one with groups 4, 5, 6, 7 and 8.
# \donttest{
# Example 2: Getting an unfeasible result
result2 <- get_agg_opt(
X = simulations$X,
W = simulations$W,
sd_threshold = 0.001
)
result2$group_agg # Error
#> NULL
result2$X # Input candidates' vote matrix
#> [,1] [,2] [,3]
#> [1,] 24 41 35
#> [2,] 30 37 33
#> [3,] 28 41 31
#> [4,] 21 46 33
#> [5,] 29 42 29
#> [6,] 19 51 30
#> [7,] 21 52 27
#> [8,] 29 46 25
#> [9,] 26 47 27
#> [10,] 21 51 28
#> [11,] 55 25 20
#> [12,] 52 26 22
#> [13,] 46 24 30
#> [14,] 23 33 44
#> [15,] 22 34 44
#> [16,] 11 70 19
#> [17,] 16 62 22
#> [18,] 34 52 14
#> [19,] 30 49 21
#> [20,] 42 42 16
result2$W # Input group-level voter matrix
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 54 6 4 6 5 8 13 4
#> [2,] 60 4 11 7 8 4 4 2
#> [3,] 35 27 8 6 9 6 3 6
#> [4,] 7 64 3 5 6 8 3 4
#> [5,] 4 56 6 4 7 8 8 7
#> [6,] 3 8 54 4 11 5 9 6
#> [7,] 7 8 51 9 7 7 5 6
#> [8,] 5 11 33 28 8 4 5 6
#> [9,] 5 4 4 62 5 11 3 6
#> [10,] 13 4 13 47 6 3 6 8
#> [11,] 6 4 4 9 59 6 6 6
#> [12,] 4 7 7 9 56 5 8 4
#> [13,] 5 8 8 7 29 29 5 9
#> [14,] 14 1 7 10 3 61 4 0
#> [15,] 1 6 8 8 6 56 5 10
#> [16,] 6 9 7 3 4 3 62 6
#> [17,] 8 5 3 6 5 9 61 3
#> [18,] 2 5 8 4 7 7 28 39
#> [19,] 4 7 4 6 4 9 6 60
#> [20,] 7 6 7 10 5 1 6 58
# }