This function simulates an election by creating matrices representing candidate votes (X)
and voters' demographic group (W)
across a specified number of ballot-boxes. It either (i) receives as input or (ii) generates a probability matrix (prob)
, indicating how likely each demographic group is to vote for each candidate.
By default, the number of voters per ballot box (ballot_voters)
is set to a vector of 100 with
length num_ballots
. You can optionally override this by providing a custom vector.
Optional parameters are available to control the distribution of votes:
group_proportions
: A vector of length num_groups
specifying
the overall proportion of each demographic group. Entries must sum to one and be non-negative.
prob
: A user-supplied probability matrix of dimension
(num_groups
\(\times\) num_candidates
). If provided, this matrix is used directly. Otherwise, voting probabilities for each group are drawn from a Dirichlet distribution.
Number of ballot boxes (b
).
Number of candidates (c
).
Number of demographic groups (g
).
A vector of length num_ballots
representing the number of voters per ballot
box. Defaults to rep(100, num_ballots)
.
A numeric value between 0 and 1 that represents the fraction of voters that are randomly assigned to ballot-boxes. The remaining voters are assigned sequentially according to their demographic group.
lambda = 0
: The assignment of voters to ballot-boxes is fully sequential in terms of their demographic group. This leads to a high heterogeneity of the voters' groups across ballot-boxes.
lambda = 1
: The assignment of voters to ballot-boxes is fully random. This leads to a low heterogeneity of the voters' group across ballot-boxes.
Default value is set to 0.5
. See Shuffling Mechanish for more details.
If provided, overrides the current global seed. Defaults to NULL
.
Optional. A vector of length num_groups
that indicates the fraction of voters that belong to each group. Default is that all groups are of the same size.
Optional. A user-supplied probability matrix of dimension (g x c)
.
If provided, this matrix is used as the underlying voting probability distribution. If not supplied, each row is sampled from a Dirichlet distribution with each parameter set to one.
An eim object with three attributes:
X
A (b x c)
matrix with candidates' votes for each ballot box.
W
A (b x g)
matrix with voters' groups for each ballot-box.
real_prob
A (g x c)
matrix with the probability that a voter from each group votes for each candidate. If prob is provided, it would equal such probability.
outcome
A (b x g x c)
array with the number of votes for each candidate in each ballot box, broken down by group.
Without loss of generality, consider an order relation of groups and ballot-boxes. The shuffling step is controlled by the lambda
parameter and operates as follows:
Initial Assignment: Voters are assigned to each ballot-box sequentially according to their demographic group. More specifically, the first ballot-boxes receive voters from the first group. Then, the next ballot-boxes receive voters from the second group. This continues until all voters have been assigned. Note that most ballot-boxes will contain voters from a single group (as long as the number of ballot-boxes exceeds the number of groups).
Shuffling: A fraction lambda
of voters who have already been assigned is selected at random. Then, the ballot-box assignment of this sample is shuffled. Hence, different lambda
values are interpreted as follows:
lambda = 0
means no one is shuffled (the initial assignment remains).
lambda = 1
means all individuals are shuffled.
Intermediate values like lambda = 0.5
shuffle half the voters.
Using a high level of lambda
(greater than 0.7) is not recommended, as this could make identification of the voting probabilities difficult. This is because higher values of lambda induce similar ballot-boxes in terms of voters' group.
The algorithm is fully explained in 'Thraves, C. Ubilla, P. and Hermosilla, D.: "A Fast Ecological Inference Algorithm for the R×C Case".
# Example 1: Default usage with 200 ballot boxes, each having 100 voters
result1 <- simulate_election(
num_ballots = 200,
num_candidates = 3,
num_groups = 5
)
# Example 2: Using a custom ballot_voters vector
result2 <- simulate_election(
num_ballots = 340,
num_candidates = 3,
num_groups = 7,
ballot_voters = rep(200, 340)
)
# Example 3: Supplying group_proportions
result3 <- simulate_election(
num_ballots = 93,
num_candidates = 3,
num_groups = 4,
group_proportions = c(0.3, 0.5, 0.1, 0.1)
)
# Example 4: Providing a user-defined prob matrix
custom_prob <- matrix(c(
0.9, 0.1,
0.4, 0.6,
0.25, 0.75,
0.32, 0.68,
0.2, 0.8
), nrow = 5, byrow = TRUE)
result4 <- simulate_election(
num_ballots = 200,
num_candidates = 2,
num_groups = 5,
lambda = 0.3,
prob = custom_prob
)
result4$real_prob == custom_prob
#> [,1] [,2]
#> [1,] TRUE TRUE
#> [2,] TRUE TRUE
#> [3,] TRUE TRUE
#> [4,] TRUE TRUE
#> [5,] TRUE TRUE
# The attribute of the output real_prob matches the input custom_prob.