sanecleanup

Sensible defaults for data retention and cleanup for SAP Commerce, based on my CX Works article Data Maintenance and Cleanup

How-To

Download the latest release
Unpack to hybris/bin/custom
If possible, disable saved values / change history (ref. help.sap.com, further recommendations in my article)
Add extension to your localextensions.xml
```
<extension name="sanecleanup" />
```
🔴 Adapt the retention rules to your project requirements 🔴
(check the available properties in project.properties)
Build and deploy.
(The rules will be automatically imported during system update)
If you get a build error regarding missing types like the example below:
1. Open sanecleanup-items.xml
2. Search for the type
3. Comment-out the whole <itemtype> tag
```
invalid index sanecleanup for type _TYPE_ on [...] declared at ((sanecleanup))::YIndex[sanecleanup-items.xml:...] due to missing enclosing type '_TYPE_'
```

WARNING
The very first execution of the retention cron jobs will take a while, depending on how long your poject is already live and if you have cleaned up anything in the past.

Consider performing a one-time cleanup before adding the extension / enabling the retention rules.

Especially the first run of cronJobLogCleanupCronJob will take a very long time, if you have never removed any cronjob log files (type LogFile).
Please consider importing and executing the script job defined in bulkdelete-cronjoblogs.impex before you set up the automated cleanup!
The job will remove all log files except the five most recent logs per CronJob. (Disclaimer: the script was tested on MS SQL / Azure SQL and SAP HANA. It is not guaranteed to work for other databases)

Do I have to clean up?

If have never even thought about that topic - yes!

You can run the following scripts in the administration console to get a quick overview:

excessive-platform-types.groovy - Generates a report about "known troublemakers"
check-audit.groovy - Check if you have too many audit logs

Here are some additional queries and "rules of thumb" that help you investigate further:

Type(s)	Query	Notes
AbstractRule	SELECT COUNT({ar:pk}), MIN({ar:modifiedtime}) AS "oldest", MAX({ar:modifiedtime}) AS "newest" FROM {AbstractRule AS ar}, {RuleStatus AS rs} WHERE {ar:status} = {rs:pk} AND {rs:code} = 'PUBLISHED' AND {ar:enddate} IS NOT NULL AND {ar:enddate} < getutcdate()	Are there any outdated rules? i.e rules that aren't valid anymore because their enddate is in the past. Warning: change `getutcdate()` to your DBMS (for HANA/MySQL: `now()` )
BusinessProcess	SELECT {p:processDefinitionName}, {s:code} AS "status", COUNT({p:pk}) AS "total", MIN({p:modifiedTime}) AS "oldest", MAX({p:modifiedTime}) AS "newest" FROM {BusinessProcess AS p LEFT JOIN ProcessState AS s ON {p:state} = {s:pk} } GROUP BY {p:processDefinitionName}, {s:code} ORDER BY "total" DESC	Are there too many (let's say > 1000) or very old BusinessProcess in your system? Also, if a lot of processes are stuck in "RUNNING" / "WAITING", you have to investigate what's wrong. (What is causing your processes to be stuck?)
Cart	SELECT {b:uid} AS "BaseSite", {u:uid} AS "USER", CASE WHEN {c:saveTime} IS NULL THEN 'regular' ELSE 'saved' END AS "cart type", COUNT({c:pk}) AS "total", MIN({c:modifiedtime}) AS "oldest", MAX({c:modifiedtime}) AS "newest" FROM { Cart AS c LEFT JOIN USER AS u ON {c:user} = {u:pk} LEFT JOIN BaseSite AS b ON {c:site} = {b:pk} } GROUP BY {b:uid}, {u:uid}, CASE WHEN {c:saveTime} IS NULL THEN 'regular' ELSE 'saved' END ORDER BY "total" DESC	Are there excessive amount of carts per site or per user? Too many saved carts? Stale (= old) carts?
CronJob (auto-generated)	SELECT {t:code} AS "CronJob Type", COUNT({c:pk}) AS "total", MIN({c:modifiedtime}) AS "oldest", MAX({c:modifiedtime}) AS "newest" FROM {CronJob AS c JOIN ComposedType AS t ON {c:itemtype} = {t:pk} LEFT JOIN TRIGGER AS trg ON {trg:cronjob} = {c:pk} } WHERE {trg:pk} IS NULL AND {c:code} LIKE '00%' AND {t:code} IN ( 'ImpExImportCronJob', 'CatalogVersionSyncCronJob', 'SolrIndexerCronJob' ) GROUP BY {t:code} ORDER BY "total" DESC	Are there too many (>10) outdated, auto-geneated jobs in your system?
CronJobHistory	SELECT {cj:code}, COUNT({h:pk}) AS "total", MIN({h:modifiedtime}) AS "oldest", MAX({h:modifiedtime}) AS "newest" FROM {cronjobhistory AS h JOIN cronjob AS cj ON {h:cronjob} = {cj:pk} } GROUP BY {cj:code} ORDER BY "total" DESC	Is there any job with > 50 histories and/or histories older than an hour? This cleanup is enabled by default in recent SAP Commerce patch releases, so this query shouldn't find anything.
EmailMessage	SELECT {bp:processDefinitionName} AS "source", {m:sent}, COUNT({m:pk}) AS "total", MIN({m:modifiedtime}) AS "oldest", MAX({m:modifiedtime}) AS "newest" FROM {EmailMessage AS m LEFT JOIN BusinessProcess AS bp ON {m:process} = {bp:pk} } GROUP BY {bp:processDefinitionName}, {m:sent} ORDER BY "total" DESC	Are there more than a handful sent/unsent messages? Are there messages that do not belong to any process?
ImpExImportCronJob (distributed impex)	SELECT {s:code} AS "status", COUNT({i:pk}) AS "total", MIN({i:modifiedtime}) AS "oldest", MAX({i:modifiedtime}) AS "newest" FROM {ImpExImportCronJob AS i LEFT JOIN CronJobStatus AS s ON {i:status} = {s:pk} } WHERE {i:code} LIKE 'distributed-impex-%' GROUP BY {s:code}	More than ~10 `FINISHED` distributed impex jobs? More than a few `PAUSED` jobs? You may have a faulty distributed impex script.
ImpexMedia	SELECT COUNT(*) FROM {ImpexMedia AS i} WHERE ( {i:code} LIKE '0_______' OR {i:code} LIKE 'generated impex media - %' )	Are there more than a handful (>100) of generated impex medias?
ImportBatchContent	SELECT COUNT({c:pk}) AS "total", MIN({c:modifiedTime}) AS "oldest", MAX({c:modifiedTime}) AS "newest" FROM {ImportBatchContent AS c LEFT JOIN ImportBatch AS b ON {b:importContentCode} = {c:code} } WHERE {b:pk} IS NULL	Are there any left-over distributed import batches?
LogFile	SELECT COALESCE({cj:code}, '<null>'), COUNT({l:pk}) AS "total", MIN({l:modifiedtime}) AS "oldest", MAX({l:modifiedtime}) AS "newest" FROM {LogFile AS l LEFT JOIN CronJob AS cj ON {l:owner} = {cj:pk} } GROUP BY {cj:code} ORDER BY "total" DESC	Are there are cronjob with more than ~10 logs and/or logs older than 14 days? (those are default values for log file retention)
ProcessTaskLog	-- Query tested with MS SQL -- Adjust the date calculation for -- other databases SELECT COUNT({l:pk}) AS "total", MIN({l:modifiedtime}) AS "oldest", MAX({l:modifiedtime}) AS "newest" FROM {ProcessTaskLog AS l} WHERE {l:creationTime} < DATEADD( MONTH, -2, GETUTCDATE() )	We recommend customer to BusinessProcess cleanup, which will eventually take care of TaskLogs cleanup. There might be the few scenarios for ProcessTaskLog cleanup: The customer wants to keep the BusinessProcess for reporting, although we don't recommend it. The customer might be using the custom task without any business process.
SavedValues,SavedValueEntry	-- total SavedValue / SavedValueEntry SELECT * FROM ( {{ SELECT 'SavedValues' AS "type", COUNT({s:pk}) AS "total" FROM {savedvalues AS s} }} UNION ALL {{ SELECT 'SavedValueEntry' AS "type", COUNT({e:pk}) AS "total" FROM {savedvalueentry AS e} }} ) summary -- SavedValues per item SELECT {s:modifiedItem} AS "item", COUNT({s:pk}) AS "total", MIN({s:modifiedtime}) AS "oldest", MAX({s:modifiedtime}) AS "newest" FROM {SavedValues AS s } GROUP BY {s:modifiedItem} ORDER BY "total" DESC -- orphaned SavedValueEntry -- (there shouldn't be any) SELECT COUNT({e:pk}) AS "total", MIN({e:modifiedtime}) AS "oldest", MAX({e:modifiedtime}) AS "newest" FROM {SavedValueEntry AS e LEFT JOIN SavedValues AS s ON {e:parent} = {s:pk} } WHERE {s:pk} IS NULL	A lot of those items accumulated over the project lifetime. If possible, disable storing saved values. (`hmc.storing.modifiedvalues.size=0`)
SolrIndexOperation	SELECT {i:qualifier}, COUNT({o:pk}) AS "total", MIN({o:modifiedTime}) AS "oldest", MAX({o:modifiedTime}) AS "newest" FROM {SolrIndexOperation AS o LEFT JOIN SolrIndex AS i ON {o:index} = {i:pk} } GROUP BY {i:qualifier} ORDER BY "total" DESC	Too many solr operations (more than ~100 per index)?
StoredHttpSession	SELECT COUNT({s:pk}) AS "total", MIN({s:modifiedtime}) AS "oldest", MAX({s:modifiedtime}) AS "newest" FROM {StoredHttpSession AS s}	Excessive amount of session? This is hard to generalize as it highly depends on your site's traffic, but if you are near or over 5 digits, it's probably too much. Simarly, stale sessions (e.g older than a day) don't need to be retained.
TaskCondition	SELECT COUNT({tc:pk}), MIN({tc:modifiedtime}) AS "oldest", MAX({tc:modifiedtime}) AS "newest" FROM {TaskCondition AS tc } WHERE {tc:task} IS NULL	Is there an excessive amount of "premature events"? Or very old (older than a a few weeks) events?

Support

Please open an issue describing your problem or your feature request.

Contributing

Any and all pull requests are welcome.
Please describe your change and the motiviation behind it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

sanecleanup

How-To

Do I have to clean up?

Support

Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

sanecleanup

How-To

Do I have to clean up?

Support

Contributing