Skip to content

Commit

Permalink
Introduce cache flush rules [RHELDST-26144]
Browse files Browse the repository at this point in the history
This change extends the cache flush config to support separate rules
with different templates, enabled via patterns matched against candidate
paths for flush.

The goal here is to reduce the amount of unnecessarily flushed
URLs/ARLs. In typical scenarios, we have three different CDN hosts in
front of a single exodus-gw environment, with certain subtrees being
only available from certain hosts. With the flat config structure
existing before this change, we had no choice but to flush cache keys
for all three of them for every path, even though each path is only
relevant to one of the hosts. Hence, we were flushing 3x as many ARLs as
we should be.

With this change we can update the configuration to only flush cache for
the necessary CDN host for each subtree, significantly cutting down the
number of cache keys for flush.

This commit is backwards-compatible with the old config style, so it can
be safely deployed before updating exodus-gw.ini.
  • Loading branch information
rohanpm committed Aug 14, 2024
1 parent e418e6e commit 13b3b47
Show file tree
Hide file tree
Showing 4 changed files with 369 additions and 73 deletions.
53 changes: 40 additions & 13 deletions docs/deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,28 +124,55 @@ exodus-gw will continue to function but will skip cache flush operations.

Enabling the feature requires the deployment of two sets of configuration.

Firstly, in the ``exodus-gw.ini`` section for the relevant environment,
set ``cache_flush_urls`` to enable cache flush by URL and/or
``cache_flush_arl_templates`` to enable cache flushing by ARL. Both options
can be used together as needed.
Firstly, in ``exodus-gw.ini``, define some cache flush rules under
sections named ``[cache_flush.{rule_name}]``.

Each rule must define a list of URL/ARL ``templates`` for calculating
the cache keys to flush. Rules may optionally define ``includes`` and
``excludes`` to select specific paths where the rule should be applied.

Once rules are defined, enable them for a specific environment by listing
them in ``cache_flush_rules`` under that environment's configuration.
See the following example:

.. code-block:: ini
[env.live]
# Root URL(s) of CDN properties for which to flush cache.
# Several can be provided.
cache_flush_urls =
https://cdn1.example.com
https://cdn2.example.com
# Templates of ARL(s) for which to flush cache.
# Rule(s) to activate for this environment.
#
# This example supposes that there are two CDN hostnames in use,
# one of which exposes all content *except* a certain subtree
# and one which exposes *only* that subtree.
cache_flush_rules =
cdn1
cdn2
[cache_flush.cdn1]
# URL or ARL template(s) for which to flush cache.
#
# Templates can use placeholders:
# - path: path of a file under CDN root
# - ttl (optional): a TTL value will be substituted
cache_flush_arl_templates =
# - ttl: a TTL value will be substituted
templates =
https://cdn1.example.com
S/=/123/22334455/{ttl}/cdn1.example.com/{path}
# Suppose that "/files" is restricted to cdn2, then the
# exclusion pattern here will avoid unnecessarily flushing
# cdn1 cache for paths underneath that subtree.
excludes =
^/files/
[cache_flush.cdn2]
templates =
https://cdn2.example.com
S/=/123/22334455/{ttl}/cdn2.example.com/{path}
# This rule only applies to this subtree, which was excluded
# from the other rule.
includes =
^/files/
Secondly, use environment variables to deploy credentials for the
Fast Purge API, according to the below table. The fields here correspond
to those used by the `.edgerc file <https://techdocs.akamai.com/developer/docs/set-up-authentication-credentials>`_
Expand Down
157 changes: 144 additions & 13 deletions exodus_gw/settings.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
import configparser
import os
import re
from collections.abc import Iterable
from dataclasses import dataclass
from enum import Enum
from typing import Any

Expand All @@ -26,6 +29,136 @@ def split_ini_list(raw: str | None) -> list[str]:
return [elem.strip() for elem in raw.split("\n") if elem.strip()]


@dataclass
class CacheFlushRule:
name: str
"""Name of this rule (from the config file)."""

templates: list[str]
"""List of URL/ARL templates.
Each template may be either:
- a base URL, e.g. "https://cdn.example.com/cdn-root"
- an ARL template, e.g. "S/=/123/22334455/{ttl}/cdn1.example.com/{path}"
Templates may contain 'ttl' and 'path' placeholders to be substituted
when calculating cache keys for flush.
When there is no 'path' in a template, the path will instead be
appended.
"""

includes: list[re.Pattern[str]]
"""List of patterns applied to decide whether this rule is
applicable to any given path.
Patterns are non-anchored regular expressions.
A path must match at least one pattern in order for cache flush
to occur for that path.
There is a default pattern of ".*", meaning that all paths will
be included by default.
Note that these includes are evaluated *after* the set of paths
for flush have already been filtered to include only entry points
(e.g. repomd.xml and other mutable paths). It is not possible to
use this mechanism to enable cache flushing of non-entry-point
paths.
"""

excludes: list[re.Pattern[str]]
"""List of patterns applied to decide whether this rule should
be skipped for any given path.
Patterns are non-anchored regular expressions.
If a path matches any pattern, cache flush won't occur.
excludes are applied after includes.
"""

def matches(self, path: str) -> bool:
"""True if this rule matches the given path."""

# We always match against absolute paths with a leading /,
# regardless of how the input was formatted.
path = "/" + path.removeprefix("/")

# Must match at least one 'includes'.
for pattern in self.includes:
if pattern.search(path):
break
else:
return False

# Must not match any 'excludes'.
for pattern in self.excludes:
if pattern.search(path):
return False

return True

@classmethod
def load_all(
cls: type["CacheFlushRule"],
config: configparser.ConfigParser,
env_section: str,
names: Iterable[str],
) -> list["CacheFlushRule"]:

out: list[CacheFlushRule] = []
for rule_name in names:
section_name = f"cache_flush.{rule_name}"
templates = split_ini_list(config.get(section_name, "templates"))
includes = [
re.compile(s)
for s in split_ini_list(
config.get(section_name, "includes", fallback=".*")
)
]
excludes = [
re.compile(s)
for s in split_ini_list(
config.get(section_name, "excludes", fallback=None)
)
]
out.append(
cls(
name=rule_name,
templates=templates,
includes=includes,
excludes=excludes,
)
)

# backwards-compatibility: if no rules were defined, but old-style
# cache flush config was specified, read it into a rule with default
# 'includes' and 'excludes'.
if not names and (
config.has_option(env_section, "cache_flush_urls")
or config.has_option(env_section, "cache_flush_arl_templates")
):
out.append(
cls(
name=f"{env_section}-legacy",
templates=split_ini_list(
config.get(
env_section, "cache_flush_urls", fallback=None
)
)
+ split_ini_list(
config.get(
env_section,
"cache_flush_arl_templates",
fallback=None,
)
),
includes=[re.compile(r".*")],
excludes=[],
)
)

return out


class Environment(object):
def __init__(
self,
Expand All @@ -36,8 +169,7 @@ def __init__(
config_table,
cdn_url,
cdn_key_id,
cache_flush_urls=None,
cache_flush_arl_templates=None,
cache_flush_rules=None,
):
self.name = name
self.aws_profile = aws_profile
Expand All @@ -46,10 +178,7 @@ def __init__(
self.config_table = config_table
self.cdn_url = cdn_url
self.cdn_key_id = cdn_key_id
self.cache_flush_urls = split_ini_list(cache_flush_urls)
self.cache_flush_arl_templates = split_ini_list(
cache_flush_arl_templates
)
self.cache_flush_rules: list[CacheFlushRule] = cache_flush_rules or []

@property
def cdn_private_key(self):
Expand All @@ -63,8 +192,8 @@ def fastpurge_enabled(self) -> bool:
are available for this environment.
"""
return (
# *at least one* URL or ARL template must be set...
(self.cache_flush_urls or self.cache_flush_arl_templates)
# There must be at least one cache flush rule in config...
bool(self.cache_flush_rules)
# ... and *all* fastpurge credentials must be set
and self.fastpurge_access_token
and self.fastpurge_client_secret
Expand Down Expand Up @@ -373,9 +502,12 @@ def load_settings() -> Settings:
config_table = config.get(env, "config_table", fallback=None)
cdn_url = config.get(env, "cdn_url", fallback=None)
cdn_key_id = config.get(env, "cdn_key_id", fallback=None)
cache_flush_urls = config.get(env, "cache_flush_urls", fallback=None)
cache_flush_arl_templates = config.get(
env, "cache_flush_arl_templates", fallback=None

cache_flush_rule_names = split_ini_list(
config.get(env, "cache_flush_rules", fallback=None)
)
cache_flush_rules = CacheFlushRule.load_all(
config, env, cache_flush_rule_names
)

settings.environments.append(
Expand All @@ -387,8 +519,7 @@ def load_settings() -> Settings:
config_table=config_table,
cdn_url=cdn_url,
cdn_key_id=cdn_key_id,
cache_flush_urls=cache_flush_urls,
cache_flush_arl_templates=cache_flush_arl_templates,
cache_flush_rules=cache_flush_rules,
)
)

Expand Down
29 changes: 18 additions & 11 deletions exodus_gw/worker/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,18 +81,25 @@ def urls_for_flush(self):
for p in uris_with_aliases(self.paths, self.aliases)
]

for cdn_base_url in self.env.cache_flush_urls:
for path in path_list:
out.append(os.path.join(cdn_base_url, path))

for arl_template in self.env.cache_flush_arl_templates:
for path in path_list:
out.append(
arl_template.format(
path=path,
ttl=self.arl_ttl(path),
for path in path_list:
# Figure out the templates applicable to this path
templates: list[str] = []
for rule in self.env.cache_flush_rules:
if rule.matches(path):
templates.extend(rule.templates)

for template in templates:
if "{path}" in template:
# interpret as a template with placeholders
out.append(
template.format(
path=path.removeprefix("/"),
ttl=self.arl_ttl(path),
)
)
)
else:
# no {path} placeholder, interpret as a root URL
out.append(os.path.join(template, path))

return out

Expand Down
Loading

0 comments on commit 13b3b47

Please sign in to comment.