Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
memoryfull committed Sep 16, 2021
0 parents commit 3b419ba
Show file tree
Hide file tree
Showing 68 changed files with 10,809 additions and 0 deletions.
8 changes: 8 additions & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Copyright © 2021 European University at St. Petersburg and Skolkovo Institute of Science and Technology

Moral rights:
Kirill Polovnikov
Nikita Pospelov
Dmitriy Skougarevskiy

The version control system provides attribution for specific lines of code.
29 changes: 29 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
cff-version: 1.2.0
message: "If you use this algorithm, please cite it as below."
authors:
- family-names: "Polovnikov"
given-names: "Kirill"
orcid: "https://orcid.org/0000-0001-9903-9623"
- family-names: "Pospelov"
given-names: "Nikita"
- family-names: "Skougarevskiy"
given-names: "Dmitriy"
orcid: "0000-0002-4022-6210"
title: "α-Indirect Control in Onion-like Networks"
date-released: 2021-09-16
year: 2021
url: "https://github.com/eusporg/alphaicon"
preferred-citation:
type: unpublished
authors:
- family-names: "Polovnikov"
given-names: "Kirill"
orcid: "https://orcid.org/0000-0001-9903-9623"
- family-names: "Pospelov"
given-names: "Nikita"
- family-names: "Skougarevskiy"
given-names: "Dmitriy"
orcid: "0000-0002-4022-6210"
url: "https://arxiv.org/abs/2109.07181"
title: "α-Indirect Control in Onion-like Networks"
year: 2021
18 changes: 18 additions & 0 deletions DEPENDENCIES
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Imports:
data.table (>= 1.13.2),
stringi (>= 1.4.4),
stringr (>= 1.3.1),
lubridate (>= 1.7.10),
remotes (>= 2.3.0),
usethis (>= 2.0.1),
ndjson (>= 0.8.0),
igraph (>= 1.2.6),
Matrix (>= 1.3-3),
matrixStats (>= 0.59.0),
stargazer (>= 5.2.1),
fastDummies (>= 1.6.3),
ggplot2 (>= 3.3.3),
ggthemes (>= 4.2.4),
ggrepel (>= 0.9.1),
ggnetwork (>= 0.5.9),
showtext (>= 0.9-2)
395 changes: 395 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

102 changes: 102 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# List of R dependencies for the project
DEPENDENCIES:
# Do nothing, the file is created outside the repo
noop

# Helper function to install the dependencies
code/helper_functions/install_dependencies.r:
# Do nothing, the file is created outside the repo
noop

# People with Significant Control snapshot from Companies House
data/uk/persons-with-significant-control-snapshot-2021-08-02.txt:
# Do nothing, the file is created outside the repo
noop

# Company Data Product snapshot from Companies House
data/uk/BasicCompanyDataAsOneFile-2021-08-01.csv:
# Do nothing, the file is created outside the repo
noop

# Industry sector names
data/uk/sic_2007_code_list.csv:
# Do nothing, the file is created outside the repo
noop

# CorpWatch SEC 10-K filings data: company name-id mapping
data/corpwatch_api_tables_csv_14aug21/cik_name_lookup.csv:
# Do nothing, the file is created outside the repo
noop

# CorpWatch SEC 10-K filings data: basic company information
data/corpwatch_api_tables_csv_14aug21/company_info.csv:
# Do nothing, the file is created outside the repo
noop

# CorpWatch SEC 10-K filings data: company locations
data/corpwatch_api_tables_csv_14aug21/company_locations.csv:
# Do nothing, the file is created outside the repo
noop

# Process the PSC snapshot
data/uk/psc_snapshot_2021-08-02.rdata: data/uk/persons-with-significant-control-snapshot-2021-08-02.txt
Rscript code/data_preparation/uk/1a_process_psc_snapshot.r

# Process the live snapshot of companies data
data/uk/uk_basic_companies_data_2021-08-01.rdata: data/uk/BasicCompanyDataAsOneFile-2021-08-01.csv data/uk/sic_2007_code_list.csv
Rscript code/data_preparation/uk/1b_process_companies_data.r

# Convert the PSC snapshot to a company-participant clean data
output/uk/uk_organisations_participants_2021_long_2aug21.csv: data/uk/psc_snapshot_2021-08-02.rdata data/uk/uk_basic_companies_data_2021-08-01.rdata
Rscript code/data_preparation/uk/2_psc_snapshot_to_participants_panel

# Create SEC 10-K Exhibit 21 company-participant evaluation set matched to PSC and live companies
data/uk/uk_parent_subsidiary_mapping_2020_2021_sec_filers_exhibit21.csv: data/corpwatch_api_tables_csv_14aug21/company_info.csv data/corpwatch_api_tables_csv_14aug21/cik_name_lookup.csv data/corpwatch_api_tables_csv_14aug21/company_locations.csv data/uk/psc_snapshot_2021-08-02.rdata data/uk/uk_basic_companies_data_2021-08-01.rdata
Rscript code/data_preparation/uk/3_prepare_affiliated_entities_evaluation_data.r

# Classify the network into SH, ST, C, and I entities
output/uk/uk_organisations_participation_graph_core_periphery_membership_6aug21.csv: output/uk/uk_organisations_participants_2021_long_2aug21.csv
jupyter nbconvert --ExecutePreprocessor.timeout=-1 --execute code/alphaicon_paper/1_compute_alphaicon.ipynb

# Compute the shares by transitivity
transitiveshares := $(wildcard output/uk/transitive/uk_organisations_transitive_ownership_alpha*.csv)
$(transitiveshares): output/uk/uk_organisations_participants_2021_long_2aug21.csv output/uk/uk_organisations_participation_graph_core_periphery_membership_6aug21.csv
jupyter nbconvert --ExecutePreprocessor.timeout=-1 --execute code/alphaicon_paper/1_compute_alphaicon.ipynb

# Helper function implementing NPI/DPI computation
code/helper_functions/compute_power_index.r:
# Do nothing, the file is created outside the repo
noop

# Compute the DPI shares
output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_dpi_10000iter.csv: code/helper_functions/compute_power_index.r
Rscript code/alphaicon_paper/2_compute_npi_dpi.r

# Compute the NPI shares
output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_npi_10000iter.csv: code/helper_functions/compute_power_index.r
Rscript code/alphaicon_paper/2_compute_npi_dpi.r

# Perform the evaluation of algorithms at different k
output/alphaicon_paper/uk_orgs_algorithm_evaluation_recall.csv: output/uk/uk_organisations_participants_2021_long_2aug21.csv output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_dpi_10000iter.csv output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_npi_10000iter.csv $(transitiveshares) output/uk/uk_organisations_participation_graph_core_periphery_membership_6aug21.csv data/uk/uk_parent_subsidiary_mapping_2020_2021_sec_filers_exhibit21.csv
Rscript code/alphaicon_paper/5_algorithm_evaluation.r

# Perform the evaluation of algorithms at different path length
output/alphaicon_paper/uk_orgs_algorithm_evaluation_recall_by_pathlength.csv: output/uk/uk_organisations_participants_2021_long_2aug21.csv output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_dpi_10000iter.csv output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_npi_10000iter.csv $(transitiveshares) output/uk/uk_organisations_participation_graph_core_periphery_membership_6aug21.csv data/uk/uk_parent_subsidiary_mapping_2020_2021_sec_filers_exhibit21.csv
Rscript code/alphaicon_paper/5_algorithm_evaluation.r

# Create the ranking of top-100 holders by each method
output/alphaicon_paper/uk_organisations_top100_holders_2021_long_2aug21.csv: output/uk/uk_organisations_participants_2021_long_2aug21.csv output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_dpi_10000iter.csv output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_npi_10000iter.csv $(transitiveshares) output/uk/uk_organisations_participation_graph_core_periphery_membership_6aug21.csv data/uk/uk_parent_subsidiary_mapping_2020_2021_sec_filers_exhibit21.csv
Rscript code/alphaicon_paper/6_rank_top_holders.r

# Compute Kendall's tau-b rank correlation of per-company participants for different methods
output/alphaicon_paper/kendall_taus_participant_ranks_dpi_npi_transitive_uk_organisations_participants_2021_7sep21.csv: output/uk/uk_organisations_participants_2021_long_2aug21.csv output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_dpi_10000iter.csv output/uk/npi_dpi/10000iter/uk_organisations_participants_2021_long_7sep21_npi_10000iter.csv $(transitiveshares) output/uk/uk_organisations_participation_graph_core_periphery_membership_6aug21.csv data/uk/uk_parent_subsidiary_mapping_2020_2021_sec_filers_exhibit21.csv
Rscript code/alphaicon_paper/6_rank_top_holders.r

# α-ICON paper
alphaicon_paper: output/alphaicon_paper/uk_organisations_top100_holders_2021_long_2aug21.csv output/alphaicon_paper/uk_orgs_algorithm_evaluation_recall.csv output/alphaicon_paper/uk_orgs_algorithm_evaluation_recall_by_pathlength.csv DEPENDENCIES
Rscript code/helper_functions/install_dependencies.r
Rscript code/helper_functions/compute_power_index.r
Rscript code/alphaicon_paper/3_summary_stat_by_node_type.r
Rscript code/alphaicon_paper/4_illustrate_algorithm.r
Rscript code/alphaicon_paper/5_algorithm_evaluation.r
Rscript code/alphaicon_paper/6_rank_top_holders.r
113 changes: 113 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@

<table>
<tbody>
<tr>
<td valign="top" width=200><img src="https://user-images.githubusercontent.com/3776887/133301237-145e43f0-d4b3-4ae5-bf15-113efc2ad189.png"></td>
<td valign="top"><h1>α-Indirect Control in Onion-like Networks</h1>
We propose a fast, accurate, and scalable algorithm to detect ultimate controlling entities in global corporate networks. α-ICON uses company-participant links to identify super-holders who exert control in networks with millions of nodes.<br><br>
By exploiting onion-like properties of such networks we iteratively peel off the hanging vertices until a dense core remains. This procedure allows for a dramatic speed-up, uncovers meaningful structures, and handles circular ownership by design.<br><br>
Read our <a href="https://arxiv.org/abs/2109.07181" target="_blank">paper</a> with the applications. As a toy example, consider the below corporate network where α-ICON designates Mr Philip Mactaggart (in green) as the super-holder exerting control over all other entities, directly or indirectly held:
</td>
</tr>
</tbody>
</table>

<img src="https://user-images.githubusercontent.com/3776887/133299028-152f030a-e1c7-428b-83ef-e5f4e92414bc.png">

## Installation

To replicate the analysis you need to clone this repository to your local machine. Then you need to install the required versions of R dependencies listed in `DEPENDENCIES`. `code/helper_functions/install_dependencies.r` automates this step, but you may still need to install the underlying libraries manually with [Homebrew](https://brew.sh) or `apt-get`, depending on your platform. Finally, you need to declare the environment variable `ALPHAICON_PATH` in bash pointing to the repository. Or, better yet, you can add it in your `.Renviron` with
```console
user:~$ echo 'ALPHAICON_PATH="path_to_cloned_repository"' >> ~/.Renviron
```

The repository does not contain any data due to its size (10+ GB unpacked); most files in `data/` and `output/` folders are zero-byte placeholders. We provide a <a href="https://drive.google.com/drive/folders/10Tq-b4BVsG3gmq2JVa026Nilzj8eojNB" target="_blank">public Google Drive folder</a> with the populated `data/` and `output/` directories. You may still need to unzip them manually.

A self-contained example of α-ICON is also available in <a href="https://colab.research.google.com/drive/1AvO8hJzwj2LoKsyxk5LfSWK7LW1U02Mc" target="_blank">Google Colaboratory</a>.

## Repository structure

```
data/
├─uk/ # Data on UK companies and participants
| ├ persons-with-significant-control-snapshot-2021-08-02.txt # Source PSC data
| ├ BasicCompanyDataAsOneFile-2021-08-01.csv # Source data on live companies in UK
| ├ sic_2007_code_list.csv # Standard Industrial Classification codes
| ├ psc_snapshot_2021-08-02.rdata # Processed People with Significant Control data
| └ uk_basic_companies_data_2021-08-01.rdata # Processed Basic Company data
|
├─corpwatch_api_tables_csv_14aug21/ # Data from CorpWatch Dump
| ├ company_info.csv # Source companies data from SEC filings
| ├ cik_name_lookup.csv # Company name variants in SEC filings
| └ company_locations.csv # Company locations in SEC filings
|
code/
├─helper_functions/
| ├ install_dependencies.r # Installs R dependencies used in the project
| └ compute_power_index.r # Computes Mizuno et al. (2020) DPI and NPI
|
├─data_preparation/
| └─uk/
| ├ 1a_process_psc_snapshot.r # Prepare source PSC data
| ├ 1b_process_companies_data.r # Prepare source data on live companies
| ├ 2_psc_snapshot_to_participants_panel.r # PSC data to entity-participant info
| └ 3_prepare_affiliated_entities_evaluation_data.r # Process CorpWatch data
|
├─alphaicon_paper/
| ├ 1_compute_alphaicon.ipynb # Jupyter Notebook w. α-ICON (also on Google Colab)
| ├ 2_compute_npi_dpi.r # Computation of Direct and Network Power Indices
| ├ 3_summary_stat_by_node_type.r # UK PSC network statistics by core/SH/ST/I
| ├ 4_illustrate_algorithm.r # Visualise selected networks
| ├ 5_algorithm_evaluation.r # Compute recall @ k and l for various algorithms
| └ 6_rank_top_holders.r # Examine the rankings of super-holders & Kendall's tau
|
output/
├─uk/
| ├ uk_organisations_participants_2021_long_2aug21.csv # Primary ownership data
| ├ uk_organisations_participation_graph_core_periphery_membership_6aug21.csv
| ├─npi_dpi/ # Mizuno et al. (2020) computation results on UK PSC data
| | └─10000iter/
| | ├ uk_organisations_participants_2021_long_7sep21_dpi_10000iter.csv # DPI
| | └ uk_organisations_participants_2021_long_7sep21_npi_10000iter.csv # NPI
| |
| ├─transitive/ # Computed α-ICON shares on equity shares or DPI weights
| | ├ uk_organisations_transitive_ownership_alpha*_2021_long_2aug21.csv # α = *
| | └ uk_organisations_transitive_ownership_alpha*_2021_long_7sep21_dpi_....csv
| |
└─alphaicon_paper/
├ uk_orgs_algorithm_evaluation_recall.csv # Algorithm recall by k
├ uk_orgs_algorithm_evaluation_recall_by_pathlength.csv # Algorithm recall by l
├ uk_organisations_top100_holders_2021_long_2aug21.csv # Top SH in PSC network
├ uk_organisations_top100_holders_diff_npi_dpi_2021_long_2aug21.csv # Top-100 SH
| # with the largest difference betw. total DPI and NPI
├ uk_organisations_top100_holders_diff_transitive_dpi_2021_long_2aug21.csv
| # Top-100 SH with the largest difference betw. total DPI and α-ICON (α=0.999)
├ uk_organisations_top100_holders_diff_transitive_npi_2021_long_2aug21.csv
| # Top-100 SH with the largest difference betw. total NPI and α-ICON (α=0.999)
└ network_examples/ # Visualisations of selected networks
```

We provide an annotated `Makefile` that documents the data analysis in our papers.

To build the ‘<a href="https://arxiv.org/abs/2109.07181" target="_blank">α-Indirect Control in Onion-like Networks</a>’ paper run `make alphaicon_paper` when in the repository folder.

Please note that those commands will not produce any publication-ready output files (e.g. tables or figures): the export statements are commented out in the code. Our intention is to make the analysis pipeline transparent to the readers with the aid of `make`:

![alphaicon_dependencies](https://user-images.githubusercontent.com/3776887/133301812-87f25078-de5a-4bea-b9b0-0e6addb51b2b.png)


## Licence
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />
Creative Commons License Attribution 4.0 International (CC BY 4.0).

Copyright © the respective contributors, as shown by the `AUTHORS` file.

People with Significant Control data is <a href="http://download.companieshouse.gov.uk/en_pscdata.html">distributed</a> by Companies House under <a href="https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/">Open Government Licence v3.0</a>.

Free Company Data Product is <a href="http://download.companieshouse.gov.uk/en_output.html">distributed</a> by Companies House under <a href="https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/">Open Government Licence v3.0</a>.


## Contacts
Dmitriy Skougarevskiy, Ph.D.

[email protected]
Loading

0 comments on commit 3b419ba

Please sign in to comment.