Transforming ACLED data

The ACLED dataset is designed with user readability in mind. This focus might complicate data manipulation in some use cases. To make the data easier to use in a variety of settings, acledR provides a suite of functions for data transformations often applied to ACLED data. Currently, acledR has three functions in the ‘transform’ family: acled_transform_longer, acled_transform_wider, and acled_transform_interactions.

1. From wide to long format - `acled_transform_longer()`

acled_transform_longer() allows you to switch between wide and long formats without the need to make a new API call. Typical ACLED data is in a wide format, with multiple actors represented in each row (see the ACLED API guide for a more detailed explanation). This format generally works well if you are interested in conducting event-based analyses. Still, there are times when you may wish to conduct actor-based analyses that might require a long data format, where each actor has a separate row, and a single event may therefore be represented in multiple rows.

Note that wide and long formats are generic terms that are more specifically referred to as dyadic and monadic data types in other ACLED documentation (see the ACLED endpoint guide).

acled_transform_longer(data, type = "full_actors")

acled_transform_longer() requires two arguments:

data: A wide format ACLED dataset.
type: A character vector indicating which columns to transpose (i.e. the columns that go from wide to long format).

The available column options upon which ACLED data can be transposed are:

full_actor: Transposes all the actor columns in the dataset (actor1, actor2, assoc_actor_1, assoc_actor_2). There will be a separate row for each actor or associate actor involved in each event. This generates four new columns: type_of_actor and actor, and inter_type and inter. type_of_actor denotes the original column in which the actor was found (i.e. actor1, actor2, assoc_actor_1, assoc_actor_2), with the ‘actor’ column simply being the actor’s name. Similarly, inter is the actor’s inter code, withinter_type denoting whether the code came from the inter1 or inter2 column.
main_actors: Transposes only actor1 and actor2. There will be separate rows for main actors only. This generates two new columns: type_of_actor and actor. type_of_actor denotes the column in which the actor was originally found, while actor is simply the name of the actor.
assoc_actors: Transposes only assoc_actor_1 and assoc_actor_2 columns. There will be separate rows for associate actors only. This generates two new columns: type_of_actor and actor. type_of_actor denotes whether the actor was originally found in the assoc_actor_1 or assoc_actor_2 column, while actor is simply the name of the associate actor. Note: The data will still include actor1 and actor2 columns.
source: Transposes only the source column. There will be a separate row for each source in the source column.

Keep in mind that you can receive some data in monadic/longer form directly from ACLED’s API, but using this function instead can provide some added benefits. Specifically:

You can use this function to transform a dyadic/wide dataset to a monadic/long dataset, thus receiving the latter without executing an additional API call.
You have more control over the columns used when transforming your dataset from wide to long format. This function allows you to transpose on the following columns: actor1 & actor2, assoc_actor_1, assoc_actor_2, and source. The API only allows you to receive long-format data based on actor1, actor2, assoc_actor_1, assoc_actor_2, without an option to control if you want some of these columns or all of them.

2. From long to wide format - `acled_transform_wider()`

acledR also offers the inverse of acled_transform_longer(), allowing you to pivot your dataframe back to a wider format (dyadic form). The function is meant to aid users that may have used acled_transform_longer() and would like to return the dataframe to its original state.

The function is similar to its counterpart:

acled_transform_wider(data, type = "full_actors")

As you can see, the arguments are the same as those for acled_transform_longer():

data: A wide format ACLED dataset
type: A character vector indicating which columns to transpose (the columns that go from long to wide format).

3. Switch between numeric and string interaction codes - `acled_transform_interactions()`

The final function in this suite, acled_transform_interactions(), allows you to easily transition from numeric interaction codes to a text description of the interaction code. You can find more information - including which actor categories correspond to which numeric codes – in ACLED’s codebook. While the acled_api() function returns interaction codes as text strings by default, this function is useful when the original API call included the parameter value inter_numeric = TRUE.

acled_transform_interactions(df, only_inters = FALSE)

The function requires two arguments:

data: An ACLED dataset which includes the inter1 and inter2 variables (when only_inter = F).
only_inters: Boolean option on whether to include only inter1 and inter2, without including interaction. This option defaults to FALSE, thus including the interaction column.

The function simply returns a modified dataframe with the swapped inter & interaction formats. In the interaction column, you will find the actor types separated by “-”, for example:

acledR::acled_old_dummy[39:40,] %>%
  # Displaying only relevant columns
  select(event_id_cnty, inter1, inter2, interaction)

## # A tibble: 2 × 4
##   event_id_cnty inter1 inter2 interaction
##   <chr>          <dbl>  <dbl>       <dbl>
## 1 ARG10606           6      0          60
## 2 ARG10605           5      1          15

… will change to …

acledR::acled_old_dummy[39:40,] %>%
  acled_transform_interaction()%>%
  select(event_id_cnty, inter1, inter2, interaction)%>%
  head(2)

## # A tibble: 2 × 4
##   event_id_cnty inter1     inter2       interaction         
##   <chr>         <chr>      <chr>        <chr>               
## 1 ARG10606      Protesters NA           Sole Protesters     
## 2 ARG10605      Rioters    State Forces State Forces-Rioters

Example

This section walks through a potential use case for the transformation functions.

Assume that you are interested in data from “South America” during the first half of 2023. The email and key values below are only examples. To replicate the output, provide your own credentials as detailed in the acled_api() vignette.

library(acledR)

acled_access(email = "your_email", key = "your_key") 

df_sa <- acled_api(regions = "South America",
                   start_date = "2023-01-01",
                   end_date = "2023-06-01",
                   monadic = F,
                   acled_access = TRUE,
                   inter_numeric = TRUE,
                   prompt = F)

## Success! Credentials authorized

## Requesting data for 13 countries from 2023-01-01 to 2023-06-01

## Processing API request

## Extracting content from API request

Let’s filter to events involving only the “ELN: National Liberation Army” in any of the primary or associate actor columns:

df_eln <-
  df_sa %>% 
  filter(
    # Check main actors
    actor1 == "ELN: National Liberation Army" |
    actor2 == "ELN: National Liberation Army" |
    # Check associate actors
    assoc_actor_1 == "ELN: National Liberation Army" |
    assoc_actor_2 == "ELN: National Liberation Army"
    )

In the filtered events, there are 175 rows, meaning there were 175 events where the “ELN: National Liberation Army” were involved as an actor or associate actor.

Instead of filtering to the events involving a particular actor, you may wish to calculate the number of events in which each actor in the dataset participates. This can be difficult with wide (i.e., dyadic) data, as an actor may be represented in any of the four actor columns. As such, you cannot simply sum the number of rows in which an actor appears in one particular column. A simple solution is to transform the dataset into long form and then calculate event counts for each actor. You can begin by using the acled_transform_longer() function:

df_sa_long <- acled_transform_longer(df_sa, type = "full_actors")

The dataset is now in long form, with each row representing a single actor in a single event. With this data, you can count the number of rows for each actor, but only after grouping by unique event_id_cnty. It is very important to count rows by unique identifiers because when transforming data to long format, events can be represented in multiple rows equal to the number of actors involved in that event.

library(tidyr)
library(dplyr)

actors_df_sa <-
  df_sa_long %>% 
  group_by(actor) %>%
  summarise(n_events = n_distinct(unique(event_id_cnty)))

To verify your results, you can filter actor counts to only “ELN: National Liberation Army”.

actors_df_sa %>% 
  filter(actor == "ELN: National Liberation Army") %>%
  pull(n_events)

## [1] 175

The number of events matches the number of rows you got when first filtering by actor.

1. From wide to long format - acled_transform_longer()

2. From long to wide format - acled_transform_wider()

3. Switch between numeric and string interaction codes - acled_transform_interactions()

Example

1. From wide to long format - `acled_transform_longer()`

2. From long to wide format - `acled_transform_wider()`

3. Switch between numeric and string interaction codes - `acled_transform_interactions()`