Transforming ACLED data
Source:vignettes/articles/acled_transformations.Rmd
acled_transformations.Rmd
The ACLED dataset is designed with user readability in mind. This
focus might complicate data manipulation in some use cases. To make the
data easier to use in a variety of settings, acledR
provides a suite of functions for data transformations often applied to
ACLED data. Currently, acledR
has three functions in the
‘transform’ family: acled_transform_longer
,
acled_transform_wider
, and
acled_transform_interactions
.
1. From wide to long format -
acled_transform_longer()
acled_transform_longer()
allows you to switch between
wide and long formats without the need to make a new API call. Typical
ACLED data is in a wide format, with multiple actors represented in each
row (see the ACLED
API guide for a more detailed explanation). This format generally
works well if you are interested in conducting event-based analyses.
Still, there are times when you may wish to conduct actor-based analyses
that might require a long data format, where each actor has a separate
row, and a single event may therefore be represented in multiple
rows.
Note that wide and long formats are generic terms that are more specifically referred to as dyadic and monadic data types in other ACLED documentation (see the ACLED endpoint guide).
acled_transform_longer(data, type = "full_actors")
acled_transform_longer()
requires two arguments:
data
: A wide format ACLED dataset.type
: A character vector indicating which columns to transpose (i.e. the columns that go from wide to long format).
The available column options upon which ACLED data can be transposed are:
full_actor
: Transposes all the actor columns in the dataset (actor1, actor2, assoc_actor_1, assoc_actor_2). There will be a separate row for each actor or associate actor involved in each event. This generates four new columns:type_of_actor
andactor
, andinter_type
andinter
.type_of_actor
denotes the original column in which the actor was found (i.e. actor1, actor2, assoc_actor_1, assoc_actor_2), with the ‘actor’ column simply being the actor’s name. Similarly,inter
is the actor’s inter code, withinter_type
denoting whether the code came from the inter1 or inter2 column.main_actors
: Transposes only actor1 and actor2. There will be separate rows for main actors only. This generates two new columns:type_of_actor
andactor
.type_of_actor
denotes the column in which the actor was originally found, whileactor
is simply the name of the actor.assoc_actors
: Transposes only assoc_actor_1 and assoc_actor_2 columns. There will be separate rows for associate actors only. This generates two new columns:type_of_actor
andactor
.type_of_actor
denotes whether the actor was originally found in the assoc_actor_1 or assoc_actor_2 column, whileactor
is simply the name of the associate actor. Note: The data will still include actor1 and actor2 columns.source
: Transposes only the source column. There will be a separate row for each source in the source column.
Keep in mind that you can receive some data in monadic/longer form directly from ACLED’s API, but using this function instead can provide some added benefits. Specifically:
You can use this function to transform a dyadic/wide dataset to a monadic/long dataset, thus receiving the latter without executing an additional API call.
You have more control over the columns used when transforming your dataset from wide to long format. This function allows you to transpose on the following columns: actor1 & actor2, assoc_actor_1, assoc_actor_2, and source. The API only allows you to receive long-format data based on actor1, actor2, assoc_actor_1, assoc_actor_2, without an option to control if you want some of these columns or all of them.
2. From long to wide format -
acled_transform_wider()
acledR
also offers the inverse of
acled_transform_longer()
, allowing you to pivot your
dataframe back to a wider format (dyadic form). The function is meant to
aid users that may have used acled_transform_longer()
and
would like to return the dataframe to its original state.
The function is similar to its counterpart:
acled_transform_wider(data, type = "full_actors")
As you can see, the arguments are the same as those for
acled_transform_longer()
:
data
: A wide format ACLED datasettype
: A character vector indicating which columns to transpose (the columns that go from long to wide format).
3. Switch between numeric and string interaction codes -
acled_transform_interactions()
The final function in this suite,
acled_transform_interactions()
, allows you to easily
transition from numeric interaction codes to a text description of the
interaction code. You can find more information - including which actor
categories correspond to which numeric codes – in ACLED’s codebook.
While the acled_api()
function returns interaction codes as
text strings by default, this function is useful when the original API
call included the parameter value inter_numeric = TRUE
.
acled_transform_interactions(df, only_inters = FALSE)
The function requires two arguments:
data
: An ACLED dataset which includes the inter1 and inter2 variables (whenonly_inter = F
).only_inters
: Boolean option on whether to include only inter1 and inter2, without including interaction. This option defaults toFALSE
, thus including the interaction column.
The function simply returns a modified dataframe with the swapped inter & interaction formats. In the interaction column, you will find the actor types separated by “-”, for example:
acledR::acled_old_dummy[39:40,] %>%
# Displaying only relevant columns
select(event_id_cnty, inter1, inter2, interaction)
## # A tibble: 2 × 4
## event_id_cnty inter1 inter2 interaction
## <chr> <dbl> <dbl> <dbl>
## 1 ARG10606 6 0 60
## 2 ARG10605 5 1 15
… will change to …
acledR::acled_old_dummy[39:40,] %>%
acled_transform_interaction()%>%
select(event_id_cnty, inter1, inter2, interaction)%>%
head(2)
## # A tibble: 2 × 4
## event_id_cnty inter1 inter2 interaction
## <chr> <chr> <chr> <chr>
## 1 ARG10606 Protesters NA Sole Protesters
## 2 ARG10605 Rioters State Forces State Forces-Rioters
Example
This section walks through a potential use case for the transformation functions.
Assume that you are interested in data from “South America” during
the first half of 2023. The email and key values below are only
examples. To replicate the output, provide your own credentials as
detailed in the acled_api()
vignette.
library(acledR)
acled_access(email = "your_email", key = "your_key")
df_sa <- acled_api(regions = "South America",
start_date = "2023-01-01",
end_date = "2023-06-01",
monadic = F,
acled_access = TRUE,
inter_numeric = TRUE,
prompt = F)
## Success! Credentials authorized
## Requesting data for 13 country. Accounting for the requested time period and ACLED coverage dates, this request includes approximately 14,693 events.
## Processing API request
## Extracting content from API request
Let’s filter to events involving only the “ELN: National Liberation Army” in any of the primary or associate actor columns:
df_eln <-
df_sa %>%
filter(
# Check main actors
actor1 == "ELN: National Liberation Army" |
actor2 == "ELN: National Liberation Army" |
# Check associate actors
assoc_actor_1 == "ELN: National Liberation Army" |
assoc_actor_2 == "ELN: National Liberation Army"
)
In the filtered events, there are 174 rows, meaning there were 174 events where the “ELN: National Liberation Army” were involved as an actor or associate actor.
Instead of filtering to the events involving a particular actor, you
may wish to calculate the number of events in which each actor in the
dataset participates. This can be difficult with wide (i.e., dyadic)
data, as an actor may be represented in any of the four actor columns.
As such, you cannot simply sum the number of rows in which an actor
appears in one particular column. A simple solution is to transform the
dataset into long form and then calculate event counts for each actor.
You can begin by using the acled_transform_longer()
function:
df_sa_long <- acled_transform_longer(df_sa, type = "full_actors")
The dataset is now in long form, with each row representing a single actor in a single event. With this data, you can count the number of rows for each actor, but only after grouping by unique event_id_cnty. It is very important to count rows by unique identifiers because when transforming data to long format, events can be represented in multiple rows equal to the number of actors involved in that event.
library(tidyr)
library(dplyr)
actors_df_sa <-
df_sa_long %>%
group_by(actor) %>%
summarise(n_events = n_distinct(unique(event_id_cnty)))
To verify your results, you can filter actor counts to only “ELN: National Liberation Army”.
## [1] 174
The number of events matches the number of rows you got when first filtering by actor.