Skip to content

Latest commit

 

History

History
89 lines (56 loc) · 5.1 KB

README.md

File metadata and controls

89 lines (56 loc) · 5.1 KB

📡 Innovation Sweet Spots

Open-source code for data-driven horizon scanning

👋 Welcome!

Innovation Sweet Spots is an experimental, data-driven horizon scanning project, led by Nesta's Discovery Hub. Read more about our motivation on Medium, and check out our first report on green technologies. The project code has also been used to analyse parenting technologies, and we're soon publishing a new report on food tech and innovation.

We are building upon Nesta's Data Analytics Practice expertise and previous work on innovation mapping, leveraging data science and machine-learning methods to track the trajectory of innovations and technologies for social good.

By combining insights across several large datasets that are commonly only analysed in isolation, we paint a multi-dimensional picture of the innovations indicating the resources they are attracting and how they are perceived.

NB: This codebase and documentation is still under development, where some of the code is still living in Jupyter notebooks whereas some utilities have already been neatly factored out into modules. Please contact us if you're interested in re-using parts of the codebase, and we'll be happy to help.

🛠️ Installation

Set up a conda environment

conda create --name tutorial python=3.9
conda activate tutorial

Install the required packages

pip install -r requirements.txt
pip install -e .

If you're here for the public discourse tutorial, open the notebook innovation_sweet_spots/analysis/examples/tutorials/Data_driven_discourse_analysis.ipynb in your favourite development environment.

💾 Datasets

To uncover research, investment and public discourse trends, we are presently using the following data:

All these datasets except Crunchbase and Dealroom are freely available. Note, however, that this project accesses some of these large datasets (namely GtR and Crunchbase) via our internal Nesta database and as such are intended for internal use.

In the future, we might add other datasets to our approach.

Click to read data access guidelines

NB: This information is slightly out of date and will be updated soon

Research project and company data

To download GtR and Crunchbase datasets from Nesta database, you will first need to decrypt the config files (if you don't have the key, reach out to Karlis).

$ git stash
$ git-crypt unlock /path/to/key

The most recent version of the Gateway to Research (GtR) and Crunchbase datasets can then be fetched by running the command below. Note that you need to be connected via Nesta's VPN when accessing the database.

$ python innovation_sweet_spots/pipeline/fetch_daps1_data/flow.py --no-pylint --environment=conda run

The Guardian news

We are using Guardian API to search for articles with specific key terms. For accessing the API, you you'll need to proceed as follows:

  • Request an API key from Guardian website (see here)
  • Store it somewhere safe on your local machine (outside the repo) in a .txt file
  • Specify the path to this file in .env file, by adding a new line with export GUARDIAN_API_KEY=path/to/file
  • Use the functions in innovation_sweet_spots.getters.guardian

To see examples of using our public discourse analysis tools, check innovation_sweet_spots/analysis/examples/public_discourse_analysis.

Hansard

Please ask Karlis to access the Hansard dataset. More details coming soon...

🤝 Contributor guidelines

Technical and working style guidelines


Project based on Nesta's data science project template (Read the docs here).