-
Notifications
You must be signed in to change notification settings - Fork 128
DuckPAN Overview
See also : https://metacpan.org/pod/pinto, https://stratopan.com/ for the idea behind hosting "your own CPAN".
https://github.com/duckduckgo/p5-dist-zilla-plugin-uploadtoduckpan
Dist::Zilla::Plugin::UploadToDuckPAN Extends Dist::Zilla::Plugin::UploadToCPAN.
What's involved in uploading a distribution to CPAN?
PAUSE vs. CPAN vs. search engines
A rough flow is:
Tarball -> PAUSE (HTTP POST) -> indexing/validation -> CPAN
https://duck.co/duckpan/upload accepts same POST parameters as PAUSE. Dist::Zilla::Plugin::UploadToDuckPAN exists to wrap metadata/tarball generation from Dist::Zilla::Plugin::UploadToCPAN
with DuckPAN upload defaults.
https://github.com/duckduckgo/p5-dist-zilla-plugin-duckpanmeta/
Metadata extraction, appears to be unused by other modules?
https://github.com/duckduckgo/community-platform
https://duck.co/duckpan/, https://duck.co/duckpan/upload
Having a look in the Community Platform Duckpan Controller's upload method, we see posted files are stashed in $c->d->config->cachedir.'/'.$upload->filename;
- the cachedir is set in DDGC::Config
as $ENV{'DDGC_CACHEDIR'} ? $ENV{'DDGC_CACHEDIR'} : $self->rootdir().'/cache/';
So if the environment variable DDGC_CACHEDIR
is not set, we would expect to see distributions in ~/ddgc/cache.
So, in our CPAN model above we have gone as far as the HTTP POST element. We also have the Frontend which queries the DDGC::DB::Result::DuckPAN::Release
model.
https://github.com/duckduckgo/community-platform
ddgc=# \d duckpan_release
Table "public.duckpan_release"
Column | Type | Modifiers
--------------+--------------------------+--------------------------------------------------------------
id | integer | not null default nextval('duckpan_release_id_seq'::regclass)
name | text | not null
version | text | not null
users_id | bigint | not null
filename | text | not null
current | integer | not null default 1
duckpan_meta | text | not null default '{}'::text
created | timestamp with time zone | not null
updated | timestamp with time zone | not null
DDGC::DB::Result::User
has many DDGC::DB::Result::DuckPAN::Release
https://github.com/duckduckgo/community-platform
Declares table('duckpan_goodie')
though appears to be an interface to duckpan_module
https://github.com/duckduckgo/community-platform
ddgc=# \d duckpan_module
Table "public.duckpan_module"
Column | Type | Modifiers
--------------------+--------------------------+-------------------------------------------------------------
id | integer | not null default nextval('duckpan_module_id_seq'::regclass)
name | text | not null
duckpan_release_id | bigint | not null
filename | text |
filename_pod | text |
duckpan_meta | text | not null default '{}'::text
created | timestamp with time zone | not null
updated | timestamp with time zone | not null
DDGC::DB::Result::DuckPAN::Release
has many DDGC::DB::Result::DuckPAN::Module
. To illustrate the difference between a release and a module, see module list:
http://duckpan.org/modules/02packages.details.txt
Individual modules on the left side and the release tarballs they are from are on the right.
https://github.com/duckduckgo/community-platform
Uses CPAN::Repository to maintain metadata and validate releases uploaded through DDGC::Web::Controller::Duckpan
The community platform DDGC
has an instance of DDGC::DuckPAN
https://github.com/duckduckgo/p5-duckpan-installer
Does what it says on the tin - installation and verification of App::Duckpan and DuckPAN modules.
https://github.com/duckduckgo/p5-app-duckpan
Installation, upload and test tool for DDG/DuckPAN projects and web sites, Dist::Zilla
based, uses Dist::Zilla::Plugin::UploadToDuckPAN
for release.
Since DuckPAN tools are based on Dist::Zilla (and UploadToCPAN), it might be useful to discuss the differences and similarities with uploading a dist to CPAN and to DuckPAN.
The components involved, with their CPAN analogue, are:
DuckPAN | CPAN |
---|---|
duckpan | dzil |
install.pl/duckpan | cpan/cpanm |
duck.co/duckpan/upload | PAUSE |
duckpan.org | CPAN mirror |
duck.co/duckpan/ | CPAN frontend |
The major differences are that CPAN has owned namespaces (Module/dist owners) and CPAN frontends are usually built on the metadata maintained in CPAN rather than a separately maintained metadata repository, as with duck.co/duckpan.
A rough overview of the data flow ( App::Asciio save file here: http://jbrt.org/ddg/duckpan.asciio ):
.---------------------------.
| Project |
| (e.g. community-platform) |
| |
'---------------------------'
|
dist.ini
|
v
.------------------.
| duckpan client |
| (dzil --release) |------------------------------.
'------------------' |
| |
Dist::Zilla::Plugin::UploadToDuckPAN Dist::Zilla::Plugin::Git::NextVersion
(tar/metadata/auth HTTP POST) Dist::Zilla::PluginBundle::Git
| |
v v
.----------------------------------. .----------------.
| https://duck.co/duckpan/upload | | Increment Tag |
| (DDGC::Web::Controller::Duckpan) | | Push to Github |
'----------------------------------' '----------------'
| .-------------------------------.
DDGC::DuckPAN | Metadata sources |
CPAN::Repository |.-----------------------------.|
| || DuckPAN (CPAN Style) ||
.----------------------. |-------->| modules metadata ||
| http://duckpan.org/ | | || (02packages.details.txt...) ||
| (Repository) |<------' |'-----------------------------'|
'----------------------' | | |
| | |.-----------------------------.|
| | || community-platform ||
v '-------->| (DDGC::DB::Result::DuckPAN) ||
.------------. |'-----------------------------'|
| duckpan | '-------------------------------'
| install.pl | |
| cpanm | |
'------------' v
.----------------------------------.
| https://duck.co/duckpan/ |
| (DDGC::Web::Controller::Duckpan) |
'----------------------------------'
Let's upload a dist and see which elements of the above we go through when a release needs to be uploaded. Say we have a new release of App::Duckpan:
p5-app-duckpan(master)$ duckpan installdeps
p5-app-duckpan(master)$ duckpan release
This dispatches to App::DuckPAN::Cmd::Release, which does:
my $ret = system('dzil release');
dzil release
will read the dist.ini and will see Dist::Zilla::Plugin::UploadToDuckPAN
so knows where to release to. If not configured, you will be prompted for your duck.co credentials.
So, can any user with a duck.co login upload to DuckPAN? Let's take a look at Dist::Zilla::Plugin::UploadToDuckPAN
's uploader.
has '+uploader' => (
default => sub {
my ($self) = @_;
require CPAN::Uploader;
CPAN::Uploader->VERSION('0.103004'); # require HTTPS
my $uploader = Dist::Zilla::Plugin::UploadToCPAN::_Uploader->new({
user => $self->username,
password => $self->password,
($self->has_subdir
? (subdir => $self->subdir) : ()),
($self->has_upload_uri
? (upload_uri => $self->upload_uri) : ()),
target => URI->new($self->upload_uri)->host,
});
$uploader->{'Dist::Zilla'}{plugin} = $self;
weaken $uploader->{'Dist::Zilla'}{plugin};
return $uploader;
}
);
Dist::Zilla::Plugin::UploadToCPAN
's uploader uses CPAN::Uploader
which usually uses HTTPS POST to pause.perl.org to post a dist alongside user credentials.
Now we have a fair idea that duck.co/duckpan/upload has a identical (or similar enough) HTTP interface to PAUSE.
Let's take a look at the code which handles uploads, in DDGC::Web::Controller::Duckpan
:
sub logged_in :Chained('base') :PathPart('') :CaptureArgs(0) {
my ( $self, $c ) = @_;
if (!$c->user) {
$c->response->redirect($c->chained_uri('My','login'));
return $c->detach;
}
}
sub upload :Chained('logged_in') :Args(0) {
my ( $self, $c ) = @_;
eval {
$c->add_bc('Upload');
if (!$c->user) {
$c->res->code(403);
$c->d->errorlog($c->req);
$c->d->errorlog("No user");
$c->stash->{no_user} = 1;
return $c->detach;
}
my $uploader = $c->user->username;
my $upload = $c->req->upload('pause99_add_uri_httpupload');
my $filename = $c->d->config->cachedir.'/'.$upload->filename;
$upload->copy_to($filename);
my $return = $c->d->duckpan->add_user_distribution($c->user,$filename);
my $return_ref = ref $return;
if ($return_ref eq 'DDGC::DB::Result::DuckPAN::Release') {
$c->stash->{duckpan_release} = $return;
} else {
$c->stash->{duckpan_error} = $return;
$c->res->code(403);
$c->d->errorlog($c->req);
$c->d->errorlog($c->stash->{duckpan_error});
}
};
if ($@) {
$c->res->code(403);
$c->d->errorlog($c->req);
$c->d->errorlog($@);
$c->stash->{duckpan_error} = $@;
}
if ($c->stash->{duckpan_error}) {
$c->stash->{subject} = "Error on release!"
} else {
$c->stash->{subject} = "Successful uploaded ".
$c->stash->{duckpan_release}->name." ".
$c->stash->{duckpan_release}->version;
}
$c->d->postman->template_mail(
$c->user->data->{email},
'"DuckPAN Indexer" <[email protected]>',
'[DuckPAN] '.$c->stash->{subject},
'duckpan',
$c->stash,
Cc => '"Torsten Raudssus" <[email protected]>',
);
}
A login is definitely enforced, so let's go ahead and see if we can upload to duckpan!
p5-app-duckpan(master)$ duckpan release
...
Do you want to continue the release process? [y/N]: y
[@Git/Check] branch master is in a clean state
[UploadToDuckPAN] registering upload with duck.co web server
[UploadToDuckPAN] POSTing upload for App-DuckPAN-0.136.tar.gz to https://duck.co/duckpan/upload
request failed with error code 403
Message: Forbidden
No we can not. So who can? Traditionally, PAUSE uses namespace ownership to decide whether or not a user has the right to upload a distribution. If somebody uploads Acme::Slingshot
, then the permission of the original uploader is required to overwrite or update Acme::Slingshot
. What's DuckPAN's approach?
$c->d->duckpan
referred to in the code above is an instance of DDGC::DuckPAN
. So what controls are in place in add_user_distribution
? Among them are:
unless ($user->admin)
unless ($user->data && $user->data->{email})
if ($_->version eq $dist_data->version) # (this version of this dist already uploaded)
if (version->parse($_->version) > $dist_data->version) # (higher version of this dist already uploaded)
So you need to be a Community Platform admin, though I don't see anything enforcing namespace ownership - not to say this doesn't exists or that it's required in an environment of trust.
If this process is successful, a dist for the given author is created within the ddgc site's directory tree. Release info is added to the DuckPAN::Release
model for serving by duck.co/duckpan and to the CPAN directory tree in ddgc/. In DDGC::DuckPAN
($cpan_repository
is an instance of CPAN::Repository
which is set to point at the static DuckPAN files within the ddgc directory):
my $distribution_filename_duckpan = $self->cpan_repository->add_author_distribution(uc($user->username),$distribution_filename);
return $self->add_release( $user, $dist_data->name, $dist_data->version, $distribution_filename_duckpan);
sub add_release {
my ( $self, $user, $release_name, $release_version, $filename ) = @_;
$self->ddgc->db->resultset('DuckPAN::Release')->search({
name => $release_name,
})->update({ current => 0 });
return $self->ddgc->db->resultset('DuckPAN::Release')->create({
name => $release_name,
version => $release_version,
users_id => $user->id,
filename => $filename,
});
}
All you then need to have a working CPAN-compatible repository is to point a web server at the directory maintained by CPAN::Repository
, which is precisely what the duckpan.org config does (with a html index/error page to redirect the lost). Any cpan/cpanm type client can install from and query this repository.
Since tarring/testing/uploading of distributions is done from within the projects' Git repositories, the opportunity is also taken to increment the git tag and push that to "origin" (git push origin master --tags
). This is achieved by including Dist::Zilla::Plugin::Git::NextVersion
, Dist::Zilla::PluginBundle::Git
and Dist::Zilla::Plugin::Git::ExcludeUntracked
in the dist.ini for each project:
[Git::NextVersion]
[Git::ExcludeUntracked]
[@Git]
[@Git]
expands to:
[Git::Check]
[Git::Commit]
[Git::Tag]
[Git::Push]
Git::NextVersion
increments the dist version.
Git::ExcludeUntracked
excludes files not tracked by git from the release tarball.
Git::Check
checks the git repository has no files in staging and no untracked files (i.e. it's a clean repo).
Git::Commit
Generates a new commit for the new release with message taken from the Changelog.
Git::Tag
creates a new tag with the new dist version.
Git::Push
pushes new work to "origin" (i.e. Github) including new tags.
Modules for DuckPAN support are initially installed by the DuckPAN Installer. First App::DuckPAN
is installed from CPAN:
cpanminus_install_error() if (system('cpanm App::DuckPAN'));
Then duckpan
is used to install the remaining modules - the DDG distribution consists of staging and test framework for instant answers.
cpanminus_install_error() if (system('duckpan DDG'));
Then we check to see DDG installed correctly and is the latest version, also that we have git and ssh installed.
if (system("duckpan check"))...
The duckpan
client wraps cpanm (see App::DuckPAN::Perl
and passes it tarball URLs straight from duckpan.org:
sub duckpan_install {
my ( $self, @modules ) = @_;
my $mirror = $self->app->duckpan;
my $modules_string = join(' ',@modules);
my $tempfile = tmpnam;
if (is_success(getstore($self->app->duckpan_packages,$tempfile))) {
my $packages = Parse::CPAN::Packages::Fast->new($tempfile);
my @to_install;
my $error = 0;
for (@modules) {
my $module = $packages->package($_);
if ($module) {
local $@;
my $localver = $self->get_local_version($_);
if ($localver && $localver == version->parse($module->version)) {
$self->app->print_text("You already have latest version of ".$_." with ".$localver."\n");
} elsif ($localver && $localver > version->parse($module->version)) {
$self->app->print_text("You have a newer version of ".$_." with ".$localver." (duckpan has ".version->parse($module->version).")\n");
} else {
my $latest = $self->app->duckpan.'authors/id/'.$module->distribution->pathname;
push @to_install, $latest unless grep { $_ eq $latest } @to_install;
}
} else {
$self->app->print_text("[ERROR] Can't find package ".$_." on ".$self->app->duckpan."\n");
$error = 1;
}
}
return 1 if $error;
return 0 unless @to_install;
return system("cpanm ".join(" ",@to_install));
} else {
$self->app->print_text("[ERROR] Can't reach duckpan at ".$self->app->duckpan."!\n");
return 1;
}
}
To install a module we don't yet have:
$ duckpan DDGC::Static
--> Working on http://duckpan.org/authors/id/G/GE/GETTY/DDGC-Static-0.027.tar.gz
Fetching http://duckpan.org/authors/id/G/GE/GETTY/DDGC-Static-0.027.tar.gz ... OK
Configuring DDGC-Static-0.027 ... OK
Building and testing DDGC-Static-0.027 ... OK
Successfully installed DDGC-Static-0.027
1 distribution installed
The module -> tarball mapping is done by the following lines in the duckpan_install function
my $tempfile = tmpnam;
if (is_success(getstore($self->app->duckpan_packages,$tempfile))) {
my $packages = Parse::CPAN::Packages::Fast->new($tempfile);
$self->app->duckpan_packages
is set in App::DuckPAN
to http://duckpan.org/modules/02packages.details.txt.gz. Parse::CPAN::Packages::Fast
parses package indexes. It's now simple to map a module name to a tarball path, I do it here for DDG::Goodie::UnicodeFuzzySearch
which is in J/JA/JAG/DDG-GoodieBundle-OpenSourceDuckDuckGo-0.112.tar.gz :
$ use Parse::CPAN::Packages::Fast
$ my $p = Parse::CPAN::Packages::Fast->new("02packages.details.txt.gz");
$Parse_CPAN_Packages_Fast1 = Parse::CPAN::Packages::Fast=HASH(0x9a35f68);
$ my $m = $p->package("DDG::Goodie::UnicodeFuzzySearch");
$Parse_CPAN_Packages_Fast_Package1 = Parse::CPAN::Packages::Fast::Package=HASH(0x99f0c90);
$ use DDP; p $m;
Parse::CPAN::Packages::Fast::Package {
public methods (4) : DESTROY, distribution, new, prefix
private methods (1) : __ANON__
internals: {
package "DDG::Goodie::UnicodeFuzzySearch",
version 0.112
}
}
$ p $m->distribution;
Parse::CPAN::Packages::Fast::Distribution {
Parents CPAN::DistnameInfo
public methods (6) : add_package, contains, DESTROY, is_latest_distribution, new, prefix
private methods (0)
internals: {
cpanid "JAG",
dist "DDG-GoodieBundle-OpenSourceDuckDuckGo",
distvname "DDG-GoodieBundle-OpenSourceDuckDuckGo-0.112",
extension "tar.gz",
filename "DDG-GoodieBundle-OpenSourceDuckDuckGo-0.112.tar.gz",
maturity "released",
pathname "J/JA/JAG/DDG-GoodieBundle-OpenSourceDuckDuckGo-0.112.tar.gz",
version 0.112
}
}
$ $m->distribution->pathname;
J/JA/JAG/DDG-GoodieBundle-OpenSourceDuckDuckGo-0.112.tar.gz
Tarball paths are turned into a URLs and pushed onto an array called @to_install
in duckpan_install
like so:
my $latest = $self->app->duckpan.'authors/id/'.$module->distribution->pathname;
push @to_install, $latest unless grep { $_ eq $latest } @to_install;
These are then installed by wrapping cpanm:
return system("cpanm ".join(" ",@to_install));
$ duckpan installdeps
Run from your project directory (e.g. zeroclickinfo-goodies) to install dists needed to work.
$ duckpan query
This will enumerate all Zeroclick modules in the current directory (using App::DuckPAN::DDG::get_blocks_from_current_dir
) and allow you to interactively submit queries interactively on the command line.
Debug info will tell you the module which matched your query as well as the returned answer, so if you were in the zeroclickinfo-goodies project directory you can simple run duckpan query
and :
Query: Morse Hello World!
You entered: Morse Hello World!
DDG::ZeroClickInfo {
Parents WWW::DuckDuckGo::ZeroClickInfo
public methods (1) : new
private methods (0)
internals: {
answer "Morse code: .... . .-.. .-.. --- .-- --- .-. .-.. -.. -.-.--",
answer_type "chars",
is_cached 1,
is_unsafe 0
}
}
$ duckpan server
Like duckpan query
, this will enumerate all Zeroclick modules in the current directory and run a server to allow you to check out look and feel in your browser, as well as producing the same debug output as duckpan query
.
Generated docs in http://duckpan.org/perldoc/, generated by CPAN::Documentation::HTML
- is this done manually? Index appears possibly outdated:
Does not link to modules like:
http://duckpan.org/perldoc/DDGC/
...or individual instant answers, for example. Where are these generated?
Repository - static files, docs
Dynamic metadata/docs(?) with version history(?).