ActiveReporting implements various terminology used in Relational Online Analytical Processing (Commonly referred to as ROLAP) with ActiveRecord. It provides a DSL to describe reports and analytics on your data.
ActiveReporting officially supports MySQL, PostgreSQL, and SQLite.
Add this line to your application's Gemfile:
gem 'active_reporting'
And then execute:
$ bundle
Or install it yourself as:
$ gem install active_reporting
Reporting is the collection and presentation data so that it can be analyzed. Our databases only store one thing: data. Data is great for computers but mostly worthless to humans. What ActiveReoprting does is turn that data into information to help humans make decisions.
ROLAP uses a set of terms to describe how a report is generated. ActiveReporting implements them in the closest way possible in Ruby-land.
A fact table is the primary table where information is derived from in a report. It commonly contains fact columns (usually numeric values) and dimension columns (foreign keys to other tables or values that can be grouped together).
SQL Equivalent: FROM
Rails: ActiveRecord model
A dimension is a point of data used to "slice and dice" data from a fact model. It's either a column that lives on the fact table or a foreign key to another table.
Examples:
- A sales rep on a fact table of orders
- A state of an order on a state machine
- The manufacture on a fact table of widgets
SQL Equivalent: JOIN, GROUP BY
Rails: ActiveRecord relation or attribute
A hierarchy for a dimension is related attributes that live on a dimension table used to drill down and drill up through a dimension.
Examples:
- Dates: Date, Month, Year, Quarter
- Mobile Phone: Model, Manufacture, OS, Wireless Technology
This is information related to a dimension. When the dimension lives on the fact table, the label is the column used. When the dimension is a related table, the label is a column representing the hierarchy level.
Examples:
- When dimensioning blog posts by category, the dimension is the category_id which leads to the categories table. The label would be the category name.
This isn't really an official term, but I like using it to describe further filtering of dimensionable data.
SQL Equivalent: WHERE
Rails: where()
, scopes, etc.
A measure is a column in a fact table (usually a numeric value) used in aggregations such as sum, maximum, average, etc.
Examples:
- Total amount in a sale
- Number of units used in a transaction
SQL Equivalent: Column in the fact table used in an aggregation function
Rails: ActiveRecord attribute
A metric is a measured value and the subject of the report. It is the result of the question you want answered.
SQL Equivalent: A query result
Rails: The result of an ActiveRecord query
Star schema is a way of structuring your relational data. It is one of the most common forms of organization for relational data warehousing. The layout of a star schema consists of a fact table referencing one or more dimension tables. When laid out in an entity relationship diagram, it resembles a star.
[TODO: ADD PICTURE HERE]
More information: https://en.wikipedia.org/wiki/Star_schema
Snowflake schema is a super class of star schema. A fact table still resides in the middle of the diagram, but dimension tables are normalized out into multiple tables resulting in the resemblance of a snowflake.
[TODO: ADD PICTURE HERE]
More information: https://en.wikipedia.org/wiki/Snowflake_schema
ActiveReporting is built with star schema in mind, but will work with snowflake.
Configure ActiveReporting via block configuration or by setting individual settings:
ActiveReporting::Configuration.config do |c|
c.setting = value
end
ActiveReporting::Configuration.setting = value
default_dimension_label
- If a fact model does not have a default label set for when it's used as a dimension, this value will be used. (Default: :name
)
default_measure
- If a fact model does not specify a measure to use for aggregates, this value will be used. (Default: :value
)
ransack_fallback
- If the ransack gem is loaded, allow all unknown dimension filters to be delegated to ransack. (Default: false
)
metric_lookup_class
- The name of a constant used to lookup prebuilt Reporting::Metric
objects by name. The constant should define a class method called #lookup
which can take a string or symbol of the metric name. (Default: ::Metric
)
In ActiveReporting, a fact model stores configuration information on how it can be used in reports. We use the term fact model instead of fact table because this class "models how the fact table interacts with dimensions and other reporting features".
You can put these classes anywhere you want in your app, though I recommend putting them in app/fact_models
Every fact model links to an ActiveRecord model. This is done either by naming convention or by explicitly declaring the model.
This naming convention is [ModelName]FactModel
. Meaning if you have an ActiveRecord model named Ticket
, you'll then have a TicketFactModel
to link them together.
class TicketFactModel < ActiveReporting::FactModel
end
Alternatively, you may manually specify the model manually with self.model=
class TicketFactModel < ActiveReporting::FactModel
self.model= SomeOtherModel
# OR you may pass in a string or symbol
# self.model= :some_other_model
# self.model= 'some_other_model'
# self.model= 'SomeOtherModel'
end
ActiveReporting assumes the column of a fact model used for summing, averaging, etc. is called value
. This may be changed on a fact model using measure=
. You may pass in a string or symbol of the column you wish to use for aggregations.
class OrderFactModel < ActiveReporting::FactModel
self.measure = :total
end
You must declare what a fact model is dimensional by. A valid dimension is a column on the fact model's ActiveRecord model or a belongs_to
/has_one through
relationship. has_many
relationships do not work (well) at all.
class TicketFactModel < ActiveReporting::FactModel
dimension :creator # belongs_to relationship
dimension :assignee # belongs_to relationship
dimension :category # Column on the tickets table
end
When another fact model uses a relationship as a dimension, that ActiveRecord model's fact model class can hold configuration information for how to act when used as a dimension.
By default, it is assumed a dimension's label is a column called name
. This can be changed on the fact model.
class UserFactModel < ActiveReporting::FactModel
default_dimension_label :username
end
For dimensions that can have a hierarchy (such as a mobile phone), you can declare the what columns make it up. This will allow reports to dimension against a fact model and be able to use different labels to group by.
class PhoneFactModel < ActiveReporting::FactModel
dimension_hierarchy [:model_name, :manufacturer, :os, :wireless_technology]
end
The fastest approach to group by certain date metrics is to create so-called "date dimensions". For
those Postgres users that are restricted from organizing their data in this way, Postgres provides
a way to group by datetime
column data on the fly using the date_trunc
function.
To use, declare a datetime dimension on a fact model as normal:
class UserFactModel < ActiveReporting::FactModel
dimension :created_at
end
When creating a metric, ActiveReporting will recognize implicit hierarchies for this dimension. The hierarchies correspond to the values supported by PostgreSQL. (See example under the metric section, below.)
NOTE: PRs welcomed to support this functionality in other databases.
A dimension filter provides filtering for a report. In SQL-land, this is the WHERE
clause.
Available dimension filters are defined on a FactModel
. They can be implemented via a similar syntax to a Rails scope, link to the fact model's ActiveRecord model's scope, or delegate to ransack.
class TicketFactModel < ActiveReporting::FactModel
dimension_filter :open
dimension_filter :for_category_name, ->(x) { joins(:category).where(categories: {name: x}) }
dimension_filter :subject_cont, :ransack
end
The first example exposes the Ticket.open
scope to the fact model allowing it to be used as a dimension filter.
The second example defines a lambda to be invoked like a Rails scope. It joins against the category
relationship on Ticket
and filters by the category's name.
The third example defines a filter called "subject_cont" and will delegate it to ransack when called.
Only dimension filters defined in the fact model may be used. Whitelisting available filters allows for more control over what the user may filter by. Giving the user full control to call any scope or method from the ActiveRecord model could lead to unexpected results, poor performing queries, or possible security concerns.
If ransack is available, you may flag a fact model to delegate all unknown dimension filters to ransack.
class TicketFactModel < ActiveReporting::FactModel
use_ransack_for_unknown_dimension_filters
end
If you need more granular control over the COUNT
aggregate function, you can declare an aggregate expression in the fact model. This use case can arise if need multiple aggregates on identical data sets.
The declaration takes the raw SQL expression that
you want to replace '*'
in the COUNT
function:
class TicketFactModel < ActiveReporting::FactModel
aggregate_expression :my_custom_expression,
"CASE WHEN name = 'foo' THEN 1 END"
end
Whatever expression you choose, it is important to remember that COUNT
will aggregate all non-null
results -- so anything you want to exclude should compute to null
.
A Metric
is the basic building block used to describe a question you want to answer. At minimum, a metric needs a name, a fact table and an aggregate. You can expand a metric further by including dimensions and dimension filters.
my_metric = ActiveReporting::Metric.new(
:order_total,
fact_model: OrderFactModel,
aggregate: :sum
)
name
- This is the identifying name of the metric.
fact_model
- An ActiveReporting::FactModel
class
aggregate
- The SQL aggregate used to calculate the metric. Supported aggregates include count, max, min, avg, and sum. (Default: :count
.) For count
aggregates, you can also specify an aggregate expression if you defined one in your fact model: aggregate: { count: :my_custom_expression }
-- and the expression identified will replace the '*'
when a query is made using COUNT
.
dimensions
- An array of dimensions used for the metric. When given just a symbol, the default dimension label will be used for the dimension. You may specify a hierarchy level by using a hash. (Examples: [:sales_rep, {order_date: :month}]
)
dimension_filter
- A hash were the keys are dimension filter names and the values are the values passed into the filter.
metric_filter
- An additional HAVING clause to be tacked on to the end of the query. This allows for the further filtering of the end results based on the value of the aggregate. (Examples: {gt: 3}
, {eq: 5}
, {lte: 7}
)
order_by_dimension
- Allows you to set the ordering of the results based on a dimension label. (Examples: {author: :desc}
, {sales_ref: :asc}
)
For those using Postgres, you can take advantage of implicit hierarchies in datetime
columns, as mentioned above:
class UserFactModel < ActiveReporting::FactModel
dimension :created_at
end
my_metric = ActiveReporting::Metric.new(
:my_total,
fact_model: UserFactModel,
dimensions: [{ created_at: :quarter } ]
)
A Report
takes an ActiveReporting::Metric
and ties everything together. It is responsible for building and executing the query to generate a result. The result is an simple array of hashing.
metric = ActiveReporting::Metric.new(
:order_count,
fact_model: OrderFactModel,
dimensions: [:sales_rep],
dimension_filter: {months_ago: 1}
)
report = ActiveReporting::Report.new(metric)
report.run
=> [{order_count: 12, sales_rep: 'Fred Jones', sales_rep_identifier: 123},{order_count: 17, sales_rep: 'Mary Sue', sales_rep_identifier: 123}]
A Report
may also take additional arguments to merge with the Metric
's information. This can be user input for additional filters, or to expand on a base Metric
.
dimension_identifiers
- When true, the result will include the database identifier columns of the dimensions. For example, when running a report for the total number of orders dimensioned by sales rep, the rep's IDs from the sales_reps
table will be included. (Default true
)
dimension_filter
- A hash that will be merged with the Metric
's dimension filters.
dimensions
- An array of additional dimensions which are merged with the Metric
's dimensions.
metric_filter
- Sets the HAVING clause of the final query and is merged with the Metric
's metric filter.
metric = ActiveReporting::Metric.new(
:order_count,
fact_model: OrderFactModel,
dimensions: [:sales_rep],
dimension_filter: {months_ago: 1}
)
report = ActiveReporting.new(metric, dimension_filter: {from_region: 'North'}, dimension_identifiers: false)
report.run
=> [{order_count: 17, sales_rep: 'Mary Sue'}]
It may be more DRY to store ready-made metrics in a database table or stored in memory to use as the bases for various reports. You can pass a string or symbol into a Report
instead of a Metric
to look up an pre-made metric. This is done by passing the symbol or string into the lookup
class method on the constant defined in ActiveReporting::Configuration.metric_lookup_class
.
class StoredMetrics
def lookup(metric_name)
# Code to construct and return an `ActiveReporting::Metric` object
end
end
ActiveReporting::Configuration.metric_lookup_class = StoredMetrics
report = ActiveReporting::Report.new(:a_stored_metric, ...)
After checking out the repo, run bin/setup
to install dependencies. Then, run rake test
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
You can run the test suite using rake test
. To test against a particular database, you'll need to set the
appropriate DB
environment variable, e.g. DB=pg rake test
.
Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/active_reporting. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
The gem is available as open source under the terms of the MIT License.