Skip to content
This repository has been archived by the owner on Nov 19, 2021. It is now read-only.

DuckPAN Overview

John Barrett edited this page Feb 21, 2014 · 4 revisions

See also : https://metacpan.org/pod/pinto, https://stratopan.com/ for the idea behind hosting "your own CPAN".

Components:

Dist::Zilla::Plugin::UploadToDuckPAN

https://github.com/duckduckgo/p5-dist-zilla-plugin-uploadtoduckpan

Dist::Zilla::Plugin::UploadToDuckPAN Extends Dist::Zilla::Plugin::UploadToCPAN.

What's involved in uploading a distribution to CPAN?

PAUSE vs. CPAN vs. search engines

A rough flow is:

Tarball -> PAUSE (HTTP POST) -> indexing/validation -> CPAN

https://duck.co/duckpan/upload accepts same POST parameters as PAUSE. Dist::Zilla::Plugin::UploadToDuckPAN exists to wrap metadata/tarball generation from Dist::Zilla::Plugin::UploadToCPAN with DuckPAN upload defaults.

Dist::Zilla::Plugin::DuckpanMeta

https://github.com/duckduckgo/p5-dist-zilla-plugin-duckpanmeta/

Metadata extraction, appears to be unused by other modules?

DDGC::Web::Controller::Duckpan

https://github.com/duckduckgo/community-platform

https://duck.co/duckpan/, https://duck.co/duckpan/upload

Having a look in the Community Platform Duckpan Controller's upload method, we see posted files are stashed in $c->d->config->cachedir.'/'.$upload->filename; - the cachedir is set in DDGC::Config as $ENV{'DDGC_CACHEDIR'} ? $ENV{'DDGC_CACHEDIR'} : $self->rootdir().'/cache/';

So if the environment variable DDGC_CACHEDIR is not set, we would expect to see distributions in ~/ddgc/cache.

So, in our CPAN model above we have gone as far as the HTTP POST element. We also have the Frontend which queries the DDGC::DB::Result::DuckPAN::Release model.

DDGC::DB::Result::DuckPAN::Release

https://github.com/duckduckgo/community-platform

ddgc=# \d duckpan_release
                                     Table "public.duckpan_release"
    Column    |           Type           |                          Modifiers                           
--------------+--------------------------+--------------------------------------------------------------
 id           | integer                  | not null default nextval('duckpan_release_id_seq'::regclass)
 name         | text                     | not null
 version      | text                     | not null
 users_id     | bigint                   | not null
 filename     | text                     | not null
 current      | integer                  | not null default 1
 duckpan_meta | text                     | not null default '{}'::text
 created      | timestamp with time zone | not null
 updated      | timestamp with time zone | not null

DDGC::DB::Result::User has many DDGC::DB::Result::DuckPAN::Release

DDGC::DB::Result::DuckPAN::Goodie

https://github.com/duckduckgo/community-platform

Declares table('duckpan_goodie') though appears to be an interface to duckpan_module

DDGC::DB::Result::DuckPAN::Module

https://github.com/duckduckgo/community-platform

ddgc=# \d duckpan_module
                                        Table "public.duckpan_module"
       Column       |           Type           |                          Modifiers                          
--------------------+--------------------------+-------------------------------------------------------------
 id                 | integer                  | not null default nextval('duckpan_module_id_seq'::regclass)
 name               | text                     | not null
 duckpan_release_id | bigint                   | not null
 filename           | text                     | 
 filename_pod       | text                     | 
 duckpan_meta       | text                     | not null default '{}'::text
 created            | timestamp with time zone | not null
 updated            | timestamp with time zone | not null

DDGC::DB::Result::DuckPAN::Release has many DDGC::DB::Result::DuckPAN::Module. To illustrate the difference between a release and a module, see module list:

http://duckpan.org/modules/02packages.details.txt

Individual modules on the left side and the release tarballs they are from are on the right.

DDGC::DuckPAN

https://github.com/duckduckgo/community-platform

Uses CPAN::Repository to maintain metadata and validate releases uploaded through DDGC::Web::Controller::Duckpan

The community platform DDGC has an instance of DDGC::DuckPAN

duckpan-installer

https://github.com/duckduckgo/p5-duckpan-installer

Does what it says on the tin - installation and verification of App::Duckpan and DuckPAN modules.

App::DuckPAN

https://github.com/duckduckgo/p5-app-duckpan

Installation, upload and test tool for DDG/DuckPAN projects and web sites, Dist::Zilla based, uses Dist::Zilla::Plugin::UploadToDuckPAN for release.

Work flows

Uploading a Release

Since DuckPAN tools are based on Dist::Zilla (and UploadToCPAN), it might be useful to discuss the differences and similarities with uploading a dist to CPAN and to DuckPAN.

The components involved, with their CPAN analogue, are:

DuckPAN CPAN
duckpan dzil
install.pl/duckpan cpan/cpanm
duck.co/duckpan/upload PAUSE
duckpan.org CPAN mirror
duck.co/duckpan/ CPAN frontend

The major differences are that CPAN has owned namespaces (Module/dist owners) and CPAN frontends are usually built on the metadata maintained in CPAN rather than a separately maintained metadata repository, as with duck.co/duckpan.

A rough overview of the data flow ( App::Asciio save file here: http://jbrt.org/ddg/duckpan.asciio ):

                                     .---------------------------.
                                      | Project                   |
                                      | (e.g. community-platform) |
                                      |                           |
                                      '---------------------------'
                                                    |
                                                dist.ini
                                                    |
                                                    v
                                          .------------------.
                                          | duckpan client   |
                                          | (dzil --release) |------------------------------.
                                          '------------------'                              |
                                                    |                                       |
                                  Dist::Zilla::Plugin::UploadToDuckPAN     Dist::Zilla::Plugin::Git::NextVersion
                                      (tar/metadata/auth HTTP POST)           Dist::Zilla::PluginBundle::Git    
                                                    |                                       |
                                                    v                                       v
                                  .----------------------------------.             .----------------.
                                  | https://duck.co/duckpan/upload   |             | Increment Tag  |
                                  | (DDGC::Web::Controller::Duckpan) |             | Push to Github |
                                  '----------------------------------'             '----------------'
                                                    |        .-------------------------------.
                                            DDGC::DuckPAN    | Metadata sources              |
                                            CPAN::Repository |.-----------------------------.|
                                                    |        || DuckPAN (CPAN Style)        ||
                     .----------------------.       |-------->| modules metadata            ||
                     | http://duckpan.org/  |       |        || (02packages.details.txt...) ||
                     | (Repository)         |<------'        |'-----------------------------'|
                     '----------------------'       |        |                               |
                                 |                  |        |.-----------------------------.|
                                 |                  |        || community-platform          ||
                                 v                  '-------->| (DDGC::DB::Result::DuckPAN) ||
                          .------------.                     |'-----------------------------'|
                          | duckpan    |                     '-------------------------------'
                          | install.pl |                                     |
                          | cpanm      |                                     |
                          '------------'                                     v
                                                           .----------------------------------.
                                                           | https://duck.co/duckpan/         |
                                                           | (DDGC::Web::Controller::Duckpan) |
                                                           '----------------------------------'

Code flow

Let's upload a dist and see which elements of the above we go through when a release needs to be uploaded. Say we have a new release of App::Duckpan:

p5-app-duckpan(master)$ duckpan installdeps
p5-app-duckpan(master)$ duckpan release

This dispatches to App::DuckPAN::Cmd::Release, which does:

my $ret = system('dzil release');

dzil release will read the dist.ini and will see Dist::Zilla::Plugin::UploadToDuckPAN so knows where to release to. If not configured, you will be prompted for your duck.co credentials.

So, can any user with a duck.co login upload to DuckPAN? Let's take a look at Dist::Zilla::Plugin::UploadToDuckPAN's uploader.

has '+uploader' => (
  default => sub {
    my ($self) = @_;

    require CPAN::Uploader;
    CPAN::Uploader->VERSION('0.103004');  # require HTTPS

    my $uploader = Dist::Zilla::Plugin::UploadToCPAN::_Uploader->new({
      user     => $self->username,
      password => $self->password,
      ($self->has_subdir
           ? (subdir => $self->subdir) : ()),
      ($self->has_upload_uri
           ? (upload_uri => $self->upload_uri) : ()),
      target => URI->new($self->upload_uri)->host,
    });
 
    $uploader->{'Dist::Zilla'}{plugin} = $self;
    weaken $uploader->{'Dist::Zilla'}{plugin};
 
    return $uploader;
  }
);

Dist::Zilla::Plugin::UploadToCPAN's uploader uses CPAN::Uploader which usually uses HTTPS POST to pause.perl.org to post a dist alongside user credentials.

Now we have a fair idea that duck.co/duckpan/upload has a identical (or similar enough) HTTP interface to PAUSE.

Let's take a look at the code which handles uploads, in DDGC::Web::Controller::Duckpan:

sub logged_in :Chained('base') :PathPart('') :CaptureArgs(0) {
	my ( $self, $c ) = @_;
	if (!$c->user) {
		$c->response->redirect($c->chained_uri('My','login'));
		return $c->detach;
	}
}

sub upload :Chained('logged_in') :Args(0) {
	my ( $self, $c ) = @_;
	eval {
		$c->add_bc('Upload');
		if (!$c->user) {
			$c->res->code(403);
			$c->d->errorlog($c->req);
			$c->d->errorlog("No user");
			$c->stash->{no_user} = 1;
			return $c->detach;
		}
		my $uploader = $c->user->username;
		my $upload = $c->req->upload('pause99_add_uri_httpupload');
		my $filename = $c->d->config->cachedir.'/'.$upload->filename;
		$upload->copy_to($filename);
		my $return = $c->d->duckpan->add_user_distribution($c->user,$filename);
		my $return_ref = ref $return;
		if ($return_ref eq 'DDGC::DB::Result::DuckPAN::Release') {
			$c->stash->{duckpan_release} = $return;
		} else {
			$c->stash->{duckpan_error} = $return;
			$c->res->code(403);
			$c->d->errorlog($c->req);
			$c->d->errorlog($c->stash->{duckpan_error});
		}
	};
	if ($@) {
		$c->res->code(403);
		$c->d->errorlog($c->req);
		$c->d->errorlog($@);
		$c->stash->{duckpan_error} = $@;
	}
	if ($c->stash->{duckpan_error}) {
		$c->stash->{subject} = "Error on release!"
	} else {
		$c->stash->{subject} = "Successful uploaded ".
			$c->stash->{duckpan_release}->name." ".
			$c->stash->{duckpan_release}->version;
	}
	$c->d->postman->template_mail(
		$c->user->data->{email},
		'"DuckPAN Indexer" <[email protected]>',
		'[DuckPAN] '.$c->stash->{subject},
		'duckpan',
		$c->stash,
		Cc => '"Torsten Raudssus" <[email protected]>',
	);
}

A login is definitely enforced, so let's go ahead and see if we can upload to duckpan!

p5-app-duckpan(master)$ duckpan release
...
Do you want to continue the release process? [y/N]: y
[@Git/Check] branch master is in a clean state
[UploadToDuckPAN] registering upload with duck.co web server
[UploadToDuckPAN] POSTing upload for App-DuckPAN-0.136.tar.gz to https://duck.co/duckpan/upload
request failed with error code 403
  Message: Forbidden

No we can not. So who can? Traditionally, PAUSE uses namespace ownership to decide whether or not a user has the right to upload a distribution. If somebody uploads Acme::Slingshot, then the permission of the original uploader is required to overwrite or update Acme::Slingshot. What's DuckPAN's approach?

$c->d->duckpan referred to in the code above is an instance of DDGC::DuckPAN. So what controls are in place in add_user_distribution? Among them are:

unless ($user->admin)
unless ($user->data && $user->data->{email})
if ($_->version eq $dist_data->version) # (this version of this dist already uploaded)
if (version->parse($_->version) > $dist_data->version) # (higher version of this dist already uploaded)

So you need to be a Community Platform admin, though I don't see anything enforcing namespace ownership - not to say this doesn't exists or that it's required in an environment of trust.

If this process is successful, a dist for the given author is created within the ddgc site's directory tree. Release info is added to the DuckPAN::Release model for serving by duck.co/duckpan and to the CPAN directory tree in ddgc/. In DDGC::DuckPAN ($cpan_repository is an instance of CPAN::Repository which is set to point at the static DuckPAN files within the ddgc directory):

my $distribution_filename_duckpan = $self->cpan_repository->add_author_distribution(uc($user->username),$distribution_filename);
return $self->add_release( $user, $dist_data->name, $dist_data->version, $distribution_filename_duckpan);

sub add_release {
	my ( $self, $user, $release_name, $release_version, $filename ) = @_;
	$self->ddgc->db->resultset('DuckPAN::Release')->search({
		name => $release_name,
	})->update({ current => 0 });
	return $self->ddgc->db->resultset('DuckPAN::Release')->create({
		name => $release_name,
		version => $release_version,
		users_id => $user->id,
		filename => $filename,
	});
}

All you then need to have a working CPAN-compatible repository is to point a web server at the directory maintained by CPAN::Repository, which is precisely what the duckpan.org config does (with a html index/error page to redirect the lost). Any cpan/cpanm type client can install from and query this repository.

Git tagging

Since tarring/testing/uploading of distributions is done from within the projects' Git repositories, the opportunity is also taken to increment the git tag and push that to "origin" (git push origin master --tags). This is achieved by including Dist::Zilla::Plugin::Git::NextVersion, Dist::Zilla::PluginBundle::Git and Dist::Zilla::Plugin::Git::ExcludeUntracked in the dist.ini for each project:

[Git::NextVersion]
[Git::ExcludeUntracked]
[@Git]

[@Git] expands to:

[Git::Check]
[Git::Commit]
[Git::Tag]
[Git::Push]

Git::NextVersion increments the dist version. Git::ExcludeUntracked excludes files not tracked by git from the release tarball. Git::Check checks the git repository has no files in staging and no untracked files (i.e. it's a clean repo). Git::Commit Generates a new commit for the new release with message taken from the Changelog. Git::Tag creates a new tag with the new dist version. Git::Push pushes new work to "origin" (i.e. Github) including new tags.

Module installation

DuckPAN installer

Modules for DuckPAN support are initially installed by the DuckPAN Installer. First App::DuckPAN is installed from CPAN:

cpanminus_install_error() if (system('cpanm App::DuckPAN'));

Then duckpan is used to install the remaining modules - the DDG distribution consists of staging and test framework for instant answers.

cpanminus_install_error() if (system('duckpan DDG'));

Then we check to see DDG installed correctly and is the latest version, also that we have git and ssh installed.

if (system("duckpan check"))...

Other modules

The duckpan client wraps cpanm (see App::DuckPAN::Perl and passes it tarball URLs straight from duckpan.org:

sub duckpan_install {
	my ( $self, @modules ) = @_;
	my $mirror = $self->app->duckpan;
	my $modules_string = join(' ',@modules);
	my $tempfile = tmpnam;
	if (is_success(getstore($self->app->duckpan_packages,$tempfile))) {
		my $packages = Parse::CPAN::Packages::Fast->new($tempfile);
		my @to_install;
		my $error = 0;
		for (@modules) {
			my $module = $packages->package($_);
			if ($module) {
				local $@;
				my $localver = $self->get_local_version($_);
				if ($localver && $localver == version->parse($module->version)) {
					$self->app->print_text("You already have latest version of ".$_." with ".$localver."\n");
				} elsif ($localver && $localver > version->parse($module->version)) {
					$self->app->print_text("You have a newer version of ".$_." with ".$localver." (duckpan has ".version->parse($module->version).")\n");
				} else {
					my $latest = $self->app->duckpan.'authors/id/'.$module->distribution->pathname;
					push @to_install, $latest unless grep { $_ eq $latest } @to_install;
				}
			} else {
				$self->app->print_text("[ERROR] Can't find package ".$_." on ".$self->app->duckpan."\n");
				$error = 1;
			}
		}
		return 1 if $error;
		return 0 unless @to_install;
		return system("cpanm ".join(" ",@to_install));
	} else {
		$self->app->print_text("[ERROR] Can't reach duckpan at ".$self->app->duckpan."!\n");
		return 1;
	}
}

To install a module we don't yet have:

$ duckpan DDGC::Static
--> Working on http://duckpan.org/authors/id/G/GE/GETTY/DDGC-Static-0.027.tar.gz
Fetching http://duckpan.org/authors/id/G/GE/GETTY/DDGC-Static-0.027.tar.gz ... OK
Configuring DDGC-Static-0.027 ... OK
Building and testing DDGC-Static-0.027 ... OK
Successfully installed DDGC-Static-0.027
1 distribution installed

The module -> tarball mapping is done by the following lines in the duckpan_install function

	my $tempfile = tmpnam;
	if (is_success(getstore($self->app->duckpan_packages,$tempfile))) {
		my $packages = Parse::CPAN::Packages::Fast->new($tempfile);

$self->app->duckpan_packages is set in App::DuckPAN to http://duckpan.org/modules/02packages.details.txt.gz. Parse::CPAN::Packages::Fast parses package indexes. It's now simple to map a module name to a tarball path, I do it here for DDG::Goodie::UnicodeFuzzySearch which is in J/JA/JAG/DDG-GoodieBundle-OpenSourceDuckDuckGo-0.112.tar.gz :

$ use  Parse::CPAN::Packages::Fast                                      

$ my $p = Parse::CPAN::Packages::Fast->new("02packages.details.txt.gz");
$Parse_CPAN_Packages_Fast1 = Parse::CPAN::Packages::Fast=HASH(0x9a35f68);

$ my $m = $p->package("DDG::Goodie::UnicodeFuzzySearch");               
$Parse_CPAN_Packages_Fast_Package1 = Parse::CPAN::Packages::Fast::Package=HASH(0x99f0c90);

$ use DDP; p $m;                                                        
Parse::CPAN::Packages::Fast::Package  {
    public methods (4) : DESTROY, distribution, new, prefix
    private methods (1) : __ANON__
    internals: {
        package   "DDG::Goodie::UnicodeFuzzySearch",
        version   0.112
    }
}

$ p $m->distribution;         
Parse::CPAN::Packages::Fast::Distribution  {
    Parents       CPAN::DistnameInfo
    public methods (6) : add_package, contains, DESTROY, is_latest_distribution, new, prefix
    private methods (0)
    internals: {
        cpanid      "JAG",
        dist        "DDG-GoodieBundle-OpenSourceDuckDuckGo",
        distvname   "DDG-GoodieBundle-OpenSourceDuckDuckGo-0.112",
        extension   "tar.gz",
        filename    "DDG-GoodieBundle-OpenSourceDuckDuckGo-0.112.tar.gz",
        maturity    "released",
        pathname    "J/JA/JAG/DDG-GoodieBundle-OpenSourceDuckDuckGo-0.112.tar.gz",
        version     0.112
    }
}

$ $m->distribution->pathname;
J/JA/JAG/DDG-GoodieBundle-OpenSourceDuckDuckGo-0.112.tar.gz

Tarball paths are turned into a URLs and pushed onto an array called @to_install in duckpan_install like so:

    my $latest = $self->app->duckpan.'authors/id/'.$module->distribution->pathname;
    push @to_install, $latest unless grep { $_ eq $latest } @to_install;

These are then installed by wrapping cpanm:

    return system("cpanm ".join(" ",@to_install));

Other DuckPAN features

DuckPAN README#Using DuckPAN

Dependency installation

$ duckpan installdeps

Run from your project directory (e.g. zeroclickinfo-goodies) to install dists needed to work.

Instant Answer Testing

Query

$ duckpan query

This will enumerate all Zeroclick modules in the current directory (using App::DuckPAN::DDG::get_blocks_from_current_dir) and allow you to interactively submit queries interactively on the command line.

Debug info will tell you the module which matched your query as well as the returned answer, so if you were in the zeroclickinfo-goodies project directory you can simple run duckpan query and :

Query: Morse Hello World!
  You entered: Morse Hello World!

DDG::ZeroClickInfo  {
    Parents       WWW::DuckDuckGo::ZeroClickInfo
    public methods (1) : new
    private methods (0)
    internals: {
        answer        "Morse code: .... . .-.. .-.. ---  .-- --- .-. .-.. -.. -.-.--",
        answer_type   "chars",
        is_cached     1,
        is_unsafe     0
    }
}

Server

$ duckpan server

Like duckpan query, this will enumerate all Zeroclick modules in the current directory and run a server to allow you to check out look and feel in your browser, as well as producing the same debug output as duckpan query.

Perldoc

Generated docs in http://duckpan.org/perldoc/, generated by CPAN::Documentation::HTML - is this done manually? Index appears possibly outdated:

http://duckpan.org/perldoc/

Does not link to modules like:

http://duckpan.org/perldoc/DDGC/

...or individual instant answers, for example. Where are these generated?

DuckPAN Sites

Duckpan.org

Repository - static files, docs

Duck.co/duckpan

Dynamic metadata/docs(?) with version history(?).