Executes the Expectation-Maximization (EM) algorithm indicating the approximation method to use in the E-step.
Certain methods may require additional arguments, which can be passed through ...
(see fastei-package for more details).
Usage
run_em(
object = NULL,
X = NULL,
W = NULL,
json_path = NULL,
method = "mult",
initial_prob = "group_proportional",
allow_mismatch = TRUE,
maxiter = 1000,
maxtime = 3600,
param_threshold = 0.001,
ll_threshold = as.double(-Inf),
seed = NULL,
verbose = FALSE,
group_agg = NULL,
mcmc_samples = 1000,
mcmc_stepsize = 3000,
mvncdf_method = "genz",
mvncdf_error = 0.00001,
mvncdf_samples = 5000,
...
)
Arguments
- object
An object of class
eim
, which can be created using the eim function. This parameter should not be used if either (i)X
andW
matrices or (ii)json_path
is supplied. See Note.- X
A
(b x c)
matrix representing candidate votes per ballot box.- W
A
(b x g)
matrix representing group votes per ballot box.- json_path
A path to a JSON file containing
X
andW
fields, stored as nested arrays. It may contain additional fields with other attributes, which will be added to the returned object.- method
An optional string specifying the method used for estimating the E-step. Valid options are:
mult
: The default method, using a single sum of Multinomial distributions.mvn_cdf
: Uses a Multivariate Normal CDF distribution to approximate the conditional probability.mvn_pdf
: Uses a Multivariate Normal PDF distribution to approximate the conditional probability.mcmc
: Uses MCMC to sample vote outcomes. This is used to estimate the conditional probability of the E-step.exact
: Solves the E-step using the Total Probability Law.
For a detailed description of each method, see fastei-package and References.
- initial_prob
An optional string specifying the method used to obtain the initial probability. Accepted values are:
uniform
: Assigns equal probability to every candidate within each group.proportional
: Assigns probabilities to each group based on the proportion of candidates votes.group_proportional
: Computes the probability matrix by taking into account both group and candidate proportions. This is the default method.random
: Use randomized values to fill the probability matrix.
- allow_mismatch
Boolean, if
TRUE
, allows a mismatch between the voters and votes for each ballot-box, only works ifmethod
is"mvn_cdf"
,"mvn_pdf"
,"mult"
and"mcmc"
. IfFALSE
, throws an error if there is a mismatch. By default it isTRUE
.- maxiter
An optional integer indicating the maximum number of EM iterations. The default value is
1000
.- maxtime
An optional numeric specifying the maximum running time (in seconds) for the algorithm. This is checked at every iteration of the EM algorithm. The default value is
3600
, which corresponds to an hour.- param_threshold
An optional numeric value indicating the minimum difference between consecutive probability values required to stop iterating. The default value is
0.001
. Note that the algorithm will stop if eitherll_threshold
orparam_threshold
is accomplished.- ll_threshold
An optional numeric value indicating the minimum difference between consecutive log-likelihood values to stop iterating. The default value is
inf
, essentially deactivating the threshold. Note that the algorithm will stop if eitherll_threshold
orparam_threshold
is accomplished.- seed
An optional integer indicating the random seed for the randomized algorithms. This argument is only applicable if
initial_prob = "random"
ormethod
is either"mcmc"
or"mvn_cdf"
.- verbose
An optional boolean indicating whether to print informational messages during the EM iterations. The default value is
FALSE
.- group_agg
An optional vector that refers to the group aggregation. It should contain the group indices to be aggregated. For example,
c(2, 4)
indicates that groups 1 and 2 should be aggregated to a single group and the columns 3 and 4 to another. Defaults toNULL
.- mcmc_samples
An optional integer indicating the number of samples to generate for the MCMC method. This parameter is only relevant when
method = "mcmc"
. The default value is1000
.- mcmc_stepsize
An optional integer specifying the step size for the
mcmc
algorithm. This parameter is only applicable whenmethod = "mcmc"
and will be ignored otherwise. The default value is3000
.- mvncdf_method
An optional string specifying the method used to estimate the
mvn_cdf
method via a Monte Carlo simulation. Accepted values aregenz
andgenz2
, withgenz
set as the default. This parameter is only applicable whenmethod = "mvn_cdf"
. See References for more details.- mvncdf_error
An optional numeric value defining the error threshold for the Monte Carlo simulation when estimating the
mvn_cdf
method. The default value is1e-6
. This parameter is only relevant whenmethod = "mvn_cdf"
.- mvncdf_samples
An optional integer specifying the number of Monte Carlo samples for the
mvn_cdf
method. The default value is5000
. This argument is only applicable whenmethod = "mvn_cdf"
.- ...
Added for compability
Value
The function returns an eim
object with the function arguments and the following attributes:
- prob
The estimated probability matrix
(g x c)
.- cond_prob
A
(b x g x c)
3d-array with the probability that a at each ballot-box a voter of each group voted for each candidate, given the observed outcome at the particular ballot-box.- logLik
The log-likelihood value from the last iteration.
- iterations
The total number of iterations performed by the EM algorithm.
- time
The total execution time of the algorithm in seconds.
- status
The final status ID of the algorithm upon completion:
0
: Converged1
: Maximum time reached.2
: Maximum iterations reached.
- message
The finishing status displayed as a message, matching the status ID value.
- method
The method for estimating the conditional probability in the E-step.
Aditionally, it will create mcmc_samples
and mcmc_stepsize
parameters if the specified method = "mcmc"
, or mvncdf_method
, mvncdf_error
and mvncdf_samples
if method = "mvn_cdf"
.
Also, if the eim object supplied is created with the function simulate_election, it also returns the real probability with the name real_prob
. See simulate_election.
Note
This function can be executed using one of three mutually exclusive approaches:
By providing an existing
eim
object.By supplying both input matrices (
X
andW
) directly.By specifying a JSON file (
json_path
) containing the matrices.
These input methods are mutually exclusive, meaning that you must provide exactly one of these options. Attempting to provide more than one or none of these inputs will result in an error.
When called with an eim
object, the function updates the object with the computed results.
If an eim
object is not provided, the function will create one internally using either the
supplied matrices or the data from the JSON file before executing the algorithm.
References
Thraves, C., Ubilla, P. and Hermosilla, D.: "Fast Ecological Inference Algorithm for the RxC Case". Aditionally, the MVN CDF is computed by the methods introduced in Genz, A. (2000). Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics
See also
The eim object implementation.
Examples
# \donttest{
# Example 1: Compute the Expected-Maximization with default settings
simulations <- simulate_election(
num_ballots = 300,
num_candidates = 5,
num_groups = 3,
)
model <- eim(simulations$X, simulations$W)
model <- run_em(model) # Returns the object with updated attributes
# Example 2: Compute the Expected-Maximization using the mvn_pdf method
model <- run_em(
object = model,
method = "mvn_pdf",
)
# Example 3: Run the mvn_cdf method with default settings
model <- run_em(object = model, method = "mvn_cdf")
# }
if (FALSE) { # \dontrun{
# Example 4: Perform an Exact estimation using user-defined parameters
run_em(
json_path = "a/json/file.json",
method = "exact",
initial_prob = "uniform",
maxiter = 10,
maxtime = 600,
param_threshold = 1e-3,
ll_threshold = 1e-5,
verbose = TRUE
)
} # }