Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

12 natalie analysis #13

Draft
wants to merge 3 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions notebooks/2024_12_MS_cybersecurity/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
search_recipe:
scope_statements:
- "Child online safety and privacy"
- "Protecting children in the digital age"
- "Protecing children from online threats"
- "Ensuring children's safety online"
- "Ensuring children's privacy online"
- "Helping parents protect their children online"
- "Helping parents inform their children about online safety"
- "Helping parents monitor their children's online activity"

keyword_sets:
- set_name: "Domain keywords"
keywords:
- "child"
- "children"
- "infant"
- "baby"
- "babies"
- "toddler"
- "toddlers"
- "pregnancy"
- "parent"
- "mother"
- "father"
- "family"
- "pupil"
- "teenager"
- "teen"

- set_name: "Tech keywords"
keywords:
- "cybersecurity"
- "online safety"
- "data protection"
- "encryption"
- "malware"
- "phishing"
- "firewall"
- "identity theft"
- "cyber threats"
- "cyberattack"
- "cyber attack"
- "data breach"
- "social engineering"
- "threat intelligence"
- "penetration testing"
- "cyber defence"
- "cyber defense"
- "cyber forensics"
- "ransomware"
- "zero day exploit"
- "cloud security"
253 changes: 253 additions & 0 deletions notebooks/2024_12_MS_cybersecurity/natalie_cybersec_analysis.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Examples of using analysis functionalities\n",
"\n",
"Using discovery_utils analyses functionalities for investments data\n",
"\n",
"Here, we'll find companies using their categories, but you can also use search results from the process shown in cybersec_search.ipynb"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from discovery_utils.utils import (\n",
" analysis_crunchbase,\n",
" analysis,\n",
" charts\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from discovery_utils.getters import crunchbase\n",
"CB = crunchbase.CrunchbaseGetter()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's find categories to use for selecting companies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Categories for cybersec\n",
"CB.find_similar_categories(\"cyber security\", category_type=\"broad\", n_results=10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Categories for family and kids\n",
"CB.find_similar_categories(\"family\", category_type=\"narrow\", n_results=15)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now get both types of companies and take the intersection to find those that match both categories\n",
"\n",
"This might take a minute for the first time, as it downloads the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cybersec_df = CB.get_companies_in_categories([\"Privacy and Security\"], category_type=\"broad\")\n",
"family_df = CB.get_companies_in_categories([\"Family\", \"Parenting\", \"Children\", \"Teenagers\"], category_type=\"narrow\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"matching_ids = set(cybersec_df.id) & set(family_df.id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"len(matching_ids)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can check these companies by querying the ids"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"matchings_orgs_df = CB.organisations_enriched.query(\"id in @matching_ids\")\n",
"matchings_orgs_df[['name', 'homepage_url', 'short_description']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now get the funding rounds for the matching companies - you can specify what type of funding rounds you need"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check what type of funding rounds there are\n",
"CB.unique_funding_round_types"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"funding_rounds_df = (\n",
" CB.select_funding_rounds(org_ids=matching_ids, funding_round_types=[\"angel\", \"pre_seed\", \"seed\", \"series_a\"])\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's generate some basic time series"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ts_df = analysis_crunchbase.get_timeseries(matchings_orgs_df, funding_rounds_df, period='year', min_year=2014, max_year=2024)\n",
"ts_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig = charts.ts_bar(\n",
" ts_df,\n",
" variable='raised_amount_gbp_total',\n",
" variable_title=\"Raised amount, £ millions\",\n",
" category_column=\"_category\",\n",
")\n",
"charts.configure_plots(fig, chart_title=\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look into breakdown of deal types"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"deals_df, deal_counts_df = analysis_crunchbase.get_funding_by_year_and_range(funding_rounds_df, 2014, 2024)\n",
"aggregated_funding_types_df = analysis_crunchbase.aggregate_by_funding_round_types(funding_rounds_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"deals_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"analysis_crunchbase.chart_investment_types(aggregated_funding_types_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"analysis_crunchbase.chart_deal_sizes(deals_df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"analysis_crunchbase.chart_deal_sizes_counts(deal_counts_df)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "discovery-mission-radar-prototyping-ejbE0IFh-py3.11",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}