Agency proliferation: processing the dataset of institutional characteristics of regulatory agencies - RegGov 2018

Introduction

This tutorial shows how to process the dataset of institutional characteristics of regulatory agencies in R. It presents the dataset presented at the journal article entitled “Agency proliferation and the globalization of the regulatory state: Introducing a data set on the institutional features of regulatory agencies”, by Jacint Jordana, Xavier Fernández-i-Marín and Andrea C. Bianculli, published at Regulation & Governance (December 2018).

The necessary packages in R to follow the instructions are the following:

library(dplyr)    # For the functions involving pipes ("%>%")
library(ggplot2)  # For the figures
library(tidyr)    # For arranging variables

The code shown below makes extensive use of pipes (%>%) for more flexible, simple and readable coding.

Load the dataset

The dataset can be found in long and wide formats. Recall that the data contains scores that are normalized (mean zero and standard deviation one) and correspond to the position of the institution in 2010.

The long format has a row for every institution / dimension, with a column containing the value of the score. The first four columns (Institution, Coverage, Cluster and Dimension) are identifiers of the data, whereas the data is stored only in the last column (score).

url.l <- "http://xavier-fim.net/publication/reggov-2018/ra-reggov_2018-institutions-long.csv"
# ra.l: regulatory agencies in the long format
ra.l <- read.table(url.l, header = TRUE, sep = ";", check.names = FALSE) %>%
  tbl_df()
str(ra.l)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3196 obs. of  5 variables:
##  $ Institution: Factor w/ 750 levels "Administration of Occupational Safety and Health ",..: 488 205 98 422 489 664 468 433 668 99 ...
##  $ Coverage   : Factor w/ 793 levels "Afghanistan :: Central Bank",..: 16 17 14 15 18 19 20 22 24 64 ...
##  $ Cluster    : Factor w/ 6 levels "#1 Ideal","#2 Constrained",..: 6 6 1 1 1 5 2 1 5 1 ...
##  $ Dimension  : Factor w/ 4 levels "Managerial autonomy",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ score      : num  -0.26 -0.48 0.46 0.77 1.12 0.84 0.86 0.05 0.25 0.7 ...
head(ra.l)
## # A tibble: 6 x 5
##   Institution             Coverage             Cluster   Dimension    score
##   <fct>                   <fct>                <fct>     <fct>        <dbl>
## 1 National Regulatory Au… Argentina :: Electr… #6 Respo… Managerial … -0.26
## 2 Drug, Food and Medical… Argentina :: Food S… #6 Respo… Managerial … -0.48
## 3 Central Bank of Argent… Argentina :: Centra… #1 Ideal  Managerial …  0.46
## 4 National Commission fo… Argentina :: Compet… #1 Ideal  Managerial …  0.77
## 5 National Regulatory Au… Argentina :: Gas     #1 Ideal  Managerial …  1.12
## 6 Superintendence of Hea… Argentina :: Health… #5 Auton… Managerial …  0.84

The wide format has a row for every institution, and the scores of the four dimensions are stored in different columns. The first three columns identify the observation (Institution, Coverage and Cluster), and the remaining four columns store the data (Managerial autonomy, Political independence, Public accountability, and Regulatory capabilities).

url.w <- "http://xavier-fim.net/publication/reggov-2018/ra-reggov_2018-institutions-wide.csv"
# ra.w: regulatory agencies in the wide format
ra.w <- read.table(url.w, header = TRUE, sep = ";", check.names = FALSE) %>%
  tbl_df()
str(ra.w)
## Classes 'tbl_df', 'tbl' and 'data.frame':    799 obs. of  7 variables:
##  $ Institution            : Factor w/ 750 levels "Administration of Occupational Safety and Health ",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Coverage               : Factor w/ 793 levels "Afghanistan :: Central Bank",..: 313 75 3 4 697 245 42 627 369 376 ...
##  $ Cluster                : Factor w/ 6 levels "#1 Ideal","#2 Constrained",..: 1 2 4 4 3 2 6 5 6 4 ...
##  $ Managerial autonomy    : num  0.18 0.6 -0.63 0.38 -0.16 -0.49 -0.74 0.44 -0.34 -2 ...
##  $ Political independence : num  0.38 0.89 -2.01 -1.82 0.57 0.77 0.31 -0.47 -0.48 -1.33 ...
##  $ Public accountability  : num  0.52 0.38 -1.6 -1.16 -0.7 0.39 -0.54 0.16 0.15 -0.4 ...
##  $ Regulatory capabilities: num  0.03 -0.57 -0.78 0.75 -1.29 -0.93 0.01 0.42 0.03 0.05 ...
head(ra.w)
## # A tibble: 6 x 7
##   Institution Coverage Cluster `Managerial aut… `Political inde…
##   <fct>       <fct>    <fct>              <dbl>            <dbl>
## 1 "Administr… Iceland… #1 Ide…             0.18            0.38 
## 2 Administra… Brazil … #2 Con…             0.6             0.89 
## 3 Afghan Ato… Afghani… #4 Dep…            -0.63           -2.01 
## 4 Afghanista… Afghani… #4 Dep…             0.38           -1.82 
## 5 Agence Nat… Tunisia… #3 Mim…            -0.16            0.570
## 6 Agency for… France … #2 Con…            -0.49            0.77 
## # ... with 2 more variables: `Public accountability` <dbl>, `Regulatory
## #   capabilities` <dbl>

Cluster characteristics

Several characteristics of the clusters can be calculaded by aggregating the values of the institutions in each of them, such as the mean (mean()) and the standard deviation (sd()):

cl.ch <- ra.l %>% # cl.ch: cluster characteristics
  group_by(Cluster, Dimension) %>%
  summarize(Mean = mean(score), SD = sd(score))

head(cl.ch)
## # A tibble: 6 x 4
## # Groups:   Cluster [2]
##   Cluster        Dimension                Mean    SD
##   <fct>          <fct>                   <dbl> <dbl>
## 1 #1 Ideal       Managerial autonomy     0.614 0.277
## 2 #1 Ideal       Political independence  0.719 0.259
## 3 #1 Ideal       Public accountability   0.581 0.396
## 4 #1 Ideal       Regulatory capabilities 0.343 0.251
## 5 #2 Constrained Managerial autonomy     0.123 0.496
## 6 #2 Constrained Political independence  0.560 0.153

This can be processed and plotted.

ggplot(cl.ch, aes(x = Mean, y = SD, color = Cluster)) +
  geom_point() +
  facet_wrap(~ Dimension)
Cluster mean against cluster standard deviation. By Dimension.

Figure 1: Cluster mean against cluster standard deviation. By Dimension.

The figure shows that clusters with higher means in the dimension considered are also generally more homogeneous (as shown by the lower standard deviations), specially in the case of Managerial autonomy and Regulatory capabilities. It is less clear in Political independence and almost nonexistent in Public accountability.

Getting the coverage by country and sector

If the interest lies in having some sort of aggregated measures of country and sector characteristics, then the variable Coverage provides the country(es) and sector/s that are covered by every regulatory agency.

Observations (institutions) must therefore be expanded to more cases in order to cover all the possible spaces (country/sector).

First we must separate the coverage by countries vs. sectors. Then we proceed to replicate the institutions by the each of the countries/sectors they are covering.

ra.cv <- ra.l %>% # ra.cv: regulatory agencies coverage 
  # Separate the content of coverage in two different new variables
  separate(Coverage, c("countries", "sectors"), " :: ") %>%
  # create new entries for each observation based on the variable
  # sectors, that are generated using the multiple sectors present
  # in the original plural variable (sectors).
  # Rename the resulting variable into singular (Sector)
  separate_rows(sectors, sep = " - ") %>%
  rename(Sector = sectors) %>%
  separate_rows(countries, sep = " - ") %>%
  rename(Country = countries)

From the original object ra.l with the following observations and variables:

dim(ra.l)
## [1] 3196    5

We obtain a new object ra.cv that contains more observations (the institutions copied over and over to fit the number of sectors and countries being covered) and variables (the Country and Sector identifiers).

dim(ra.cv)
## [1] 4732    6
head(ra.cv)
## # A tibble: 6 x 6
##   Institution              Country  Sector     Cluster   Dimension    score
##   <fct>                    <chr>    <chr>      <fct>     <fct>        <dbl>
## 1 National Regulatory Aut… Argenti… Electrici… #6 Respo… Managerial … -0.26
## 2 Drug, Food and Medical … Argenti… Food Safe… #6 Respo… Managerial … -0.48
## 3 Drug, Food and Medical … Argenti… Pharmaceu… #6 Respo… Managerial … -0.48
## 4 Central Bank of Argenti… Argenti… Central B… #1 Ideal  Managerial …  0.46
## 5 Central Bank of Argenti… Argenti… Financial… #1 Ideal  Managerial …  0.46
## 6 National Commission for… Argenti… Competiti… #1 Ideal  Managerial …  0.77

This way, we have moved the dataset from being based on institutions to being based on spaces covered by regulation, which now makes it suitable to work with any kind of aggregations (country or sector based) that the research has in mind. Just simply be aware that the original one was developed, thought and designed for institutions, not spaces, and therefore any transformation must be adapted to the researcher’s theoretical arguments and empirical needs.

Sector aggregations

We can now ask questions like what are the sectors with higher medians in Regulatory capabilities?

Sector Sector median
Electricity 0.410
Gas 0.410
Water 0.395
Telecommunications 0.390
Postal Services 0.375
Central Bank 0.345
Financial Services 0.220
Securities and Exchange 0.085
Pensions 0.060
Insurance 0.050
Competition 0.040
Pharmaceuticals 0.020
Health Services -0.115
Nuclear Safety and Radiological Protection -0.360
Food Safety -0.370
Work Safety -0.480
Environment -0.700
ra.cv %>%
  filter(Dimension == "Regulatory capabilities") %>%
  group_by(Sector) %>%
  summarize(`Sector median` = median(score)) %>%
  arrange(desc(`Sector median`))

Or we can generalize the study of medians to include all dimensions:

sm <- ra.cv %>% # sm: sector medians
  group_by(Dimension, Sector) %>%
  summarize(`Sector median` = median(score))

ggplot(sm, aes(x = `Sector median`, 
               y = reorder(Sector, `Sector median`),
               color = Dimension)) +
  ylab("Sector") +
  geom_point()
Sector median of the scores of regulatory agencies. By dimension.

Figure 2: Sector median of the scores of regulatory agencies. By dimension.

Country aggregations

Country aggregations are a bit more problematic than sector ones, given that the number of institutions in each country is obviously much more limited than the number of institutions in each sector. Therefore, we must proceed with care.

For instance, to get the number of institutions in the 15 countries with fewer institutions:

ra.cv %>%
  # Get the unique pairs of country * institution
  # so that we obtain institutions, not spaces
  select(Country, Institution) %>%
  unique() %>%
  # by country, simply count (using the n() function) the number
  group_by(Country) %>%
  summarize(`Number of regulatory agencies` = n()) %>%
  # order by ascending number of agencies
  # and report back only the first 10 cases
  arrange(`Number of regulatory agencies`) %>%
  slice(1:15)
Country Number of regulatory agencies
Cambodia 1
Korea, Democratic People’s Republic of 1
Lao People’s Democratic Republic 1
Madagascar 1
Myanmar 1
Brunei Darussalam 2
Côte d’Ivoire 2
Haiti 2
Kuwait 2
Syrian Arab Republic 2
Burkina Faso 3
Chad 3
Congo, the Democratic Republic of the 3
Nepal 3
Niger 3

Remember that the number of agencies do not necessarily has to match with the number of spaces (sectors) covered. So in this case we do almost exactly the same as before, but instead of counting agencies, we count spaces covered by such agencies.

ra.cv %>%
  # Get the unique pairs of country * sector
  # so that we obtain spaces, not spaces
  select(Country, Sector) %>%
  unique() %>%
  group_by(Country) %>%
  summarize(`Number of sectors covered` = n()) %>%
  arrange(`Number of sectors covered`) %>%
  slice(1:15)
Country Number of sectors covered
Côte d’Ivoire 2
Madagascar 2
Myanmar 2
Brunei Darussalam 3
Burkina Faso 3
Cambodia 3
Chad 3
Congo, the Democratic Republic of the 3
Haiti 3
Korea, Democratic People’s Republic of 3
Kuwait 3
Lao People’s Democratic Republic 3
Afghanistan 4
Cameroon 4
Syrian Arab Republic 4

We can also take a look at the aggregated scores by country in each of the dimensions, looking at their centrality (mean) and dispersion (standard deviation). Again, the aggregation function to use is entirely up to the researcher responsibility.

c.ch <- ra.cv %>% # c.ch: country characteristics
  group_by(Country, Dimension) %>%
  summarize(Mean = mean(score), SD = sd(score))

head(c.ch)
## # A tibble: 6 x 4
## # Groups:   Country [2]
##   Country     Dimension                  Mean    SD
##   <chr>       <fct>                     <dbl> <dbl>
## 1 Afghanistan Managerial autonomy     -0.05   0.694
## 2 Afghanistan Political independence  -0.888  1.30 
## 3 Afghanistan Public accountability   -1.27   0.389
## 4 Afghanistan Regulatory capabilities -0.192  0.796
## 5 Algeria     Managerial autonomy      0.0387 0.579
## 6 Algeria     Political independence  -0.256  0.921

This can be processed and plotted.

ggplot(c.ch, aes(x = Mean, y = SD, color = Dimension, label = Country)) +
  geom_point() +
  geom_text(hjust = 0, nudge_x = 0.02) +
  facet_wrap(~ Dimension)
Association between country means and country standard deviations of the aggregated scores over each regulatory sector covered in the country. By dimension.

Figure 3: Association between country means and country standard deviations of the aggregated scores over each regulatory sector covered in the country. By dimension.

There does not seem to be any sort of association between the median scores of the country agencies’ in each dimension and the dispersion of them.

But also other stories can be read from this figure. For instance, for Political independence Yemen has a mean value in all their spaces, but the spaces seem to be very heterogeneous. Therefore, one may conclude that probably in Yemen the regulatory spaces covered by the agencies are either with a very high or with a low value in their Political independence.

Sys.time()
## [1] "2018-12-17 16:54:36 CET"

Related