Simple ruby library to read and parse web-server's log files and aggregate pageview data.
check minimal instructions
Install log-analyser gem.
After instantiating log-analyser's PageviewsLogAggregator
class with the path to the logfile:
- the method all
will return the pageview count
- whilst method unique
will return the unique pageview count.
click to expand the index
To use log-analyser in your application, add this line to your Gemfile:
gem 'log-analyser'
Or install it yourself as:
$ gem install log-analyser
#!/usr/bin/env ruby
require 'pageviews_log_aggregator'
file_path = '/Users/dmazzei/projects/personal/ruby/sp_test/log-analyser/resources/webserver.log'
log_aggregator = LogAnalyser::PageviewsLogAggregator.new(file_path)
puts "\nAll pageviews"
log_aggregator.all.each do |key, value|
puts "#{key&.to_s&.ljust(28, '.')} | #{value}"
end
puts "\nUnique pageviews"
log_aggregator.unique.each do |key, value|
puts "#{key&.to_s&.ljust(28, '.')} | #{value}"
end
Install the Ruby version specified in .ruby-version
Clone the project and install Bundler
git clone [email protected]:DMazzei/log-analyser.git
cd log-analyser
gem install bundler
Run the initial setup
$ bin/setup
If you need to reinstall dependencies or something alike:
$ bundle install
Call ./bin/parse_pageview_file.rb
passing a logfile path as argument, it will return the pageview count ordered from most to less viewed.
Check --help
for more options
An example log can be found in 📁resources
folder:
$ ./bin/parse_pageview_file.rb --file 'resources/webserver.log'
|--------------------------------------------------|
| All pageviews |
|--------------------------------------------------|
| /about/2.................... | 90 |
| /contact.................... | 89 |
| /index...................... | 82 |
| /about...................... | 81 |
| /help_page/1................ | 80 |
| /home....................... | 78 |
|--------------------------------------------------|
The -u
or --unique
option will also display the unique pageview count:
$ ./bin/parse_pageview_file.rb --file 'resources/webserver.log' -u
And any specific page can be filtered with -p
or --page
:
$ ./bin/parse_pageview_file.rb --file 'resources/webserver.log' -p '/index'
|--------------------------------------------------|
| View count for page: /index |
|--------------------------------------------------|
| All pageviews |
|--------------------------------------------------|
| /index...................... | 82 |
|--------------------------------------------------|
📄 A pageview is defined as a view of a page on your site that is being tracked by the Analytics tracking code. If a user clicks reload after reaching the page, this is counted as an additional pageview. If a user navigates to a different page and then returns to the original page, a second pageview is recorded as well.
📃 A unique pageview, as seen in the Content Overview report, aggregates pageviews that are generated by the same user during the same session. A unique pageview represents the number of sessions during which that page was viewed one or more times.
The library is prepared to parser text files, containing one entry per line, in the format: \page_name identifier
.
A space must separate the page name (first column) from the user identifier (e.g. IP address):
/help_page/1 126.318.035.038
/contact 184.123.665.067
/home 184.123.665.067
$ git clone [email protected]:DMazzei/log-analyser.git
$ cd log-analyser
$ gem install bundler
$ bundle install
And the world is your oyster...
You can also run $ bundle exec console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run $ bundle exec rake install
.
To release a new version, update the version number in version.rb
, and then run $ bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Rubocop is used as code analyser and maintain code formatting (as well as some best practices).
Use $ bundle exec rake rubocop
to run the checks.
Use $ bundle exec rspec
or $ bundle exec rake spec:all
to run all the tests.
✅ To run only unit-tests
$ bundle exec rake spec:unit
✅ To run only integration tests
$ bundle exec rake spec:integration
The test coverage is handled by rspec
, simplecov
and coveralls
.
Status and coverage history can be checked here.
Following the creation of a Pull Request a CI workflow is triggered in CircleCI, that can be checked here.
This workflow consist in building the library; Running rubocop and rspec to validate integrity and code quality; And lastly generating and pushing a feature-gem that can be used for development and tests.
After passing all checks and requirements on github, a PR can be merged as soon as it is reviewed and approved. The master branch merge process will trigger the deployment process on CircleCI, and this workflow ends with the generation of a tagged-gem.
The whole deployment process will finish by building and tagging a new gem version and pushing it to rubygems.org.
⚠️ To merge changes into master, the version must be bumped up, otherwise the deployment will fail!
The version must be updated inversion.rb
.
Bug reports and pull requests are welcome on GitHub at https://github.com/DMazzeig/log-analyser.
- One conundrum faced that can be reviewed, deciding between:
- reading the file whilst aggregation data, preserving memory - e.g. using
Set
; - loading data into memory and leaving aggregation and count to be dealt later, gaining flexibility and performance;
- reading the file whilst aggregation data, preserving memory - e.g. using
- Extend the accepted logfile format;
- Add more options for sorting and filtering;
- Automate library version bump up;
The gem is available as open source under the terms of the MIT License.