Part two of the Vienna WoC TODO Manager code is up, in which I design and implement a Catalyst+DBIx::Class+Reaction app from the ground up in public and document the process.
The latest article is Database and Domain Design, in which I don my gear and go exploring in the barren wastes of confusion and spite with only a one-page wiki spec as a map, seeking the one true design for my domain that will allow me to build an app atop it that meets the requirements in full.
The complete article is on the Shadowcat site, and the canonical location for comments is the use.perl.org journal entry.
I'm going to be building an application to help run the Vienna.pm Winter of Code, and since the resulting application is going to be open source and in public subversion as an example of Catalyst+DBIx::Class+Reaction, I've decided to write up the development process as I go along. Because the WoC stuff seems to be primarily announced on use.perl.org I'm going to post each article there for comments rather than here (the original of the articles will be on the Shadowcat site and once that has RSS and comments this blog may move there entirely).
The series's table of contents (which will be updated as I post new stuff) lives here, the use.perl.org journal is here and the RSS feed is here
Or so this blog post seems to indicate.
I have nothing to say except \o/
Ok, so this post is spectacularly late. Work's been busy, my business partner got married and I got to annoy all sorts of brilliant people at OSCON (my slides are up if you care). And I -did- actually write the code that this post depends on a while back, I just haven't written it up yet because I suck.
So, anyway. This is a continuation of the series of posts begun here, but this time rather than going on a mad dash to add more features we're going to step back and clean our usage up a bit so this feels like a real tool rather than a poor excuse for a series of blog-did I type that out loud? Shit. Er. SHINY OBJECT! DISTRACTION! DISTRACTION!
Ahem.
Now, so far I've been firing the repl up using
perl -Ilib -MDevel::REPL -e 'Devel::REPL->new->run;'
which works fine, but isn't exactly elegant for command line use. Worse still, if I want a plugin or two preloaded I have to do something like
perl -Ilib -MDevel::REPL -e 'my $repl = Devel::REPL->new; $repl->load_plugin("History"); $repl->run;'
which crosses the border out of 'not exactly elegant' and doesn't stop until it passes 'ugly'. Not good.
So, time to step back and figure out what I'm wanting to do. First thing, let's make the running syntax simpler -
perl -Ilib -MDevel::REPL::Script=run -e1;
That's easy enough to implement -
package Devel::REPL::Script;
use Moose;
use Devel::REPL;
use namespace::clean -except => [ qw(meta) ];has '_repl' => (
is => 'ro', isa => 'Devel::REPL', required => 1,
default => sub { Devel::REPL->new() }
);sub run {
my ($self) = @_;
$self->_repl->run;
}sub import {
my ($class, @opts) = @_;
return unless (@opts == 1 && $opts[0] eq 'run');
$class->new->run;
}
So, what's happening here is that
perl -MClassName=foo,bar,baz
is equivalent to
use ClassName qw(foo bar baz);
is equivalent to
require ClassName;
ClassName->import('foo', 'bar', 'baz');
and then the -e1 just gives perl a script fragment to run after the import returns, so your interpreter doesn't hang waiting for input. So, now I can create script/re.pl -
#!/usr/bin/env perl
use Devel::REPL::Script 'run';
The /usr/bin/env should ensure we pick up whatever perl's first in $PATH rather than it being set at install-time (since I often have several perls on a system and so do other developers and this is a developer tool). This, admittedly, doesn't work on win32. Patches welcome on that front.
But this still doesn't solve the second problem, loading plugins - it just makes getting as far as a prompt I can call $_REPL->load_plugin at quicker. So, to solve this I want two approaches - Profiles, which allow you to ship a canned collection of settings as a class, and a configurable rc file to add the personal touch to your repl environment.
First, the rcfile implementation. This is fairly simple -
sub load_rcfile {
my ($self, $rc_file) = @_;# plain name => ~/.re.pl/${rc_file}
if ($rc_file !~ m!/!) {
$rc_file = File::Spec->catfile(File::HomeDir->my_home, '.re.pl', $rc_file);
}
This way an rcfile of 'foo' becomes ~/.re.pl/foo but any other path like ./foo or /path/to/foo carries on untouched. File::Spec and File::HomeDir are used to try and maximise portability.
-r is the same as the shell scripting -r test, it checks that not only the file exists but we're allowed to read it
if (-r $rc_file) {
open RCFILE, '<', $rc_file || die "Couldn't open ${rc_file}: $!";
my $rc_data;
{ local $/; $rc_data = <RCFILE>; }
$/ is perl's input separator; the 'local' means our changes will expire when the call stack's un-wound at the end of the block (we can't use lexical my() on perl internal variables), and not supplying a value means it becomes undef so the single <RCFILE> read operation slurps the entire file.
close RCFILE; # Don't care if this fails
$self->eval_rcdata($rc_data);
warn "Error executing rc file ${rc_file}: $@\n" if $@;
}
}sub eval_rcdata {
my ($self, $data) = @_;
local $CURRENT_SCRIPT = $self;
This dynamically scopes $CURRENT_SCRIPT - here we use local rather than my because it's scoped to the call-stack, so any code we call from here on in will have this value unless there's an inner eval_rcdata call that makes another dynamic scoping level further down
$self->_repl->eval($data);
}sub current {
confess "->current should only be called as class method" if ref($_[0]);
confess "No current instance (valid only during rc parse)"
unless $CURRENT_SCRIPT;
return $CURRENT_SCRIPT;
}
and thanks to the local above, from within our rc file we can simply do
Devel::REPL::Script->current
to get at the currently executing script object.
So, for example, my RC file for a DBIx::Class project will usually live in /path/to/checkout/.re.pl/project.rc and contain -
Devel::REPL::Script->current->load_rcfile('repl.rc'); # load my global ~/.re.pl/repl.rc
use lib 'lib'; # to get at the lib/Project.pm, lib/Project/* perl modules
use Project::Schema; # load the DBIC schema
Project::Schema->connection('dbi:Pg:dbname=project_matthewt_test','matthewt',''); # connect to db
Project::Schema->stacktrace(1); # turn on stack traces for DBI errorssub schema { 'Project::Schema' } # shortcut so things like schema->sources works
sub rs { Project::Schema->resultset(shift); } # shortcut so rs('Foo')->find(1); works
sub cols { Project::Schema->source(shift)->columns; } # cols('Foo') returns a column list
Now, I can happily put my load_plugin calls into my ~/.re.pl/repl.rc (I'll get to where that default turns up from later - patience, patience :) but I think pretty much -every- Devel::REPL user wants a similar base set of plugins, and there are likely common configurations for particular tasks (I'll probably parametrise and ship my DBIx::Class setup at some point for a start), so we need a way to ship a "profile". And since we happen to have CPAN handy, we might as well implement that as a perl module -
sub load_profile {
my ($self, $profile) = @_;
$profile = "Devel::REPL::Profile::${profile}" unless $profile =~ /::/;
This means a profile argument of 'Foo' will become 'Devel::REPL::Profile::Foo', but for e.g. a profile of 'DBIx::Class::Devel::REPL::Profile' would be left untouched. Class names can only get so long before we want to cry.
Class::MOP::load_class($profile);
This will either load the class or throw an error - it's much like require but encapsulates the necessary faffing to go from class name to file to load safely and is already tested for us.
confess "Profile class ${profile} doesn't do 'Devel::REPL::Profile'"
unless $profile->does('Devel::REPL::Profile');
->does checks that the profile class has declared itself to consume the Devel::REPL::Profile role, which in this case is a pure interface role that just requires the class to have an apply_profile method
$profile->new->apply_profile($self->_repl);
which we then call, and let the profile do whatever it wants to configure the repl object.
}
So, the ::Profile role is as simple as -
package Devel::REPL::Profile;
use Moose::Role;
use namespace::clean -except => [ 'meta' ];requires 'apply_profile';
1;
and the 'Default' profile, aka "the stuff I'm fairly convinced everybody is going to want loaded", is just
package Devel::REPL::Profile::Default;
use Moose;
use namespace::clean -except => [ 'meta' ];with 'Devel::REPL::Profile';
sub plugins {
qw(History LexEnv DDS Packages Commands);
}sub apply_profile {
my ($self, $repl) = @_;
$repl->load_plugin($_) for $self->plugins;
}1;
There probably wasn't really much need to factor out the plugin list like that, but it'll make it easier for other people to subclass this one to have "these plugins plus a few more".
Now we've got the functionality together, the last thing is to make it so that we can specify the rc file and profile from the command line. So, a quick bit of code to make attributes and load them both on script init -
has 'rcfile' => (
is => 'ro', isa => 'Str', required => 1, default => sub { 'repl.rc' },
);has 'profile' => (
is => 'ro', isa => 'Str', required => 1, default => sub { 'Default' },
);sub BUILD {
my ($self) = @_;
$self->load_profile($self->profile);
$self->load_rcfile($self->rcfile);
}
(and yes, that's where your default ~/.re.pl/repl.rc comes from :), and then we need to extend the run() method to parse options out of @ARGV, check for flags we understand and handles the values appropriately.
Right?
Nah. Far too much effort. Fortunately, MooseX::Getopt can save us the trouble. Just add
with 'MooseX::Getopt';
and change import to be
sub import {
my ($class, @opts) = @_;
return unless (@opts == 1 && $opts[0] eq 'run');
$class->new_with_options->run;
}
and the new_with_options constructor introspects our attribute names, parses the command line args and does the right thing, so now
re.pl --rcfile ./.re.pl/projectname.rc
works as expected without any extra code required from us at all.
Now, this post is getting a bit long for its own good, so I'm going to punt the other things I was going to go over until I have time and brainpower to explore them properly, and instead I'm going to present you with a randomised perl hack to mull over in the meantime.
I keep find myself writing code that looks something like
$self->foo(
$self->bar(
$self->baz(
$val
)
)
);
and it gets really boring after a while. So, integral@freenode#perl was talking about writing a method composition operator in perl6 and how you could syntaxify all this away. Me being me, I wondered if it was possible to produce something suitable in perl5. The answer is, indeed, yes -
sub pipeline;
sub pipeline {
my @methods = @_;
my $last = pop(@methods);
if (@methods) {
\sub {
my ($obj, @args) = @_;
$obj->${pipeline @methods}(
$obj->$last(@args)
);
};
} else {
\sub {
shift->$last(@_);
};
}
}matthewt@cain ~/tmp $ re.pl --rcfile pipeline
$ { package Foo; # this was all one line but I'm being nice to your eyes :)
sub foo { warn "foo"; -$_[1]; }
sub bar { warn "bar"; $_[1]+2 }
sub baz { warn "baz"; $[[1]+3 }
}
$ my $foo = bless({}, 'Foo');
$Foo1 = Foo=HASH(0x8977a38);
$ $foo->${pipeline qw(foo bar baz)}(10);
baz at (eval 78) line 6.
bar at (eval 78) line 6.
foo at (eval 78) line 6.
-15
Understanding the implementation of this one is left as an exercise for the reader. Feel free to have at it in the comments should you so desire.
mst out.
Addendum: aristotle proposes this alternative implementation -
sub pipeline {
my $self;
my $code = sub { @_ };
for my $method ( reverse @_ ) {
my $prev = $code;
$code = sub { $self->$method( $prev->( @_ ) ) };
}
return \sub { $self = shift; $code->( @_ ) };
}
which he considers to be much clearer. I'm ambivalent, but the original appeals to the lisper in me; I guess his is maybe more idiomatic perl. Shrug.
For those of you who're interested in me demonstrating just how badly I interview, you can hear a perlcast podcast interview with me about DBIx::Class which was recorded last month.
Better still, after 7 dev releases, four of them RC grade, a lot of bug fixing, a lot of hating databases, and sterling work from many many people, I'm proud to announce that DBIx::Class 0.08000 is on its way to being available from the CPAN! ... and of course two minutes after I wrote this, somebody found a bug. So please, please make sure you get 0.08001, not 0.08000.
Given it's been not that far short of a year since the last major release, there are quite a lot of new features to show for the time we've spent, so I'm just going to mention a few highlights here.
First, a short list of feature/bug stuff:
- deep prefetch now works for a lot more cases (and is completely rewritten to be clearer and faster)
- inflate_column now obeys 'accessor' column_info key for accessor naming
- multiple ordering columns in DBIx::Class::Ordered
- support for BYTEA and similar columns in Pg (and general support so other drivers can be updated)
- ANSI join syntax now unrolled into WHERE clause to support Oracle 8 and below
- modified reference approaches so $schema objects don't go out of scope anymore
- more efficient database disconnect handling and cleaner error reporting
- significant speed improvements in common row operations
- experimental Replication storage class for "modify master, select from slave" setups
There are also a few bigger things that I'd like to highlight.
DBIx::Class::Schema->load_namespaces
This is an alternative to load_classes which assumes an alternative layout for automatic class loading. It assumes that all result classes are underneath a sub-namespace of the schema called Result, any
corresponding ResultSet classes are underneath a sub-namespace of the schema
called ResultSet.
For example:
# load My::Schema::Result::CD, My::Schema::Result::Artist,
# My::Schema::ResultSet::CD, etc...
My::Schema->load_namespaces;# Override everything to use ugly names.
# In this example, if there is a My::Schema::Res::Foo, but no matching
# My::Schema::RSets::Foo, then Foo will have its
# resultset_class set to My::Schema::RSetBase
My::Schema->load_namespaces(
result_namespace => 'Res',
resultset_namespace => 'RSets',
default_resultset_class => 'RSetBase',
);# Put things in other namespaces
My::Schema->load_namespaces(
result_namespace => '+Some::Place::Results',
resultset_namespace => '+Another::Place::RSets',
);
DBIx::Class::Row->new/insert/update
DBIx::Class will now correctly handle being passed a hashref (for a one-one or many-one relationship) or arrayref of hashrefs (for a one-many relationship) at new() or update() time and the correct data will get created - for example
DBIx::Class::Schema::Versioned
my $artist = $schema->resultset('Artist')
->create({ name => 'Fred 2',
cds => [
{ title => 'Music to code by',
year => 2007,
},
],
});# Add a new CD
$artist->update({cds => [ $artist->cds,
{ title => 'Yet another CD',
year => 2006,
},
],
});
DBIx::Class also now has experimental schema versioning support. Adding
__PACKAGE__->load_components(qw/+DBIx::Class::Schema::Versioned/);
__PACKAGE__->upgrade_directory('/path/to/upgrades/');
__PACKAGE__->backup_directory('/path/to/backups/');
to your schema class, and calling $schema->create_ddl_dir to generate a directory of SQL files for your current schema version (which is best practice since it then allows deploy to work on target systems without SQL::Translator present) will allow you to call $schema->upgrade to do upgrades. See the perldoc for more details.
Have fun!
So, there was some fascinating discussion on my first post that got onto the subject of some of the magic tricks that people play under the hood, notably touching on Smalltalk and Ruby. Piers Cawley summed up my real views on the subject for me, noting
- "With so many of these things it's not the capability itself that's bad, it's the of the bad uses people put it to."
and I use a hell of a lot of magic in perl sometimes, albeit carefully encapsulated magic - the principle reason I'm still using perl is that python and ruby just aren't as flexible as perl. Which means that this article is going to be perl all the way down, because of course more flexible means more ways to shoot yourself in the face - and I'm going to run through three that I've run into in production code and been driven insane by.
Headshot One: sub UNIVERSAL::foo { }
This is probably the least horrible of the things I'm going to discuss - as Piers points out, smalltalk does something fairly equivalent to this a lot to great effect. The idea of this is that you can provide a method on all objects in your program with a single definition, so for example
sub UNIVERSAL::debug_print { require Data::Dumper; warn Data::Dumper::Dumper($_[0]); }
would mean that for any object in your program, you're guaranteed to be able to call
$obj->debug_print
and get the contents of your object dumped to STDERR.
Or, at least, that's the theory. Where this can become a headshot is in its interactions with other things - for a start, now you're messing with a namespace that's shared interpreter-wide so you can get name clashes. But, of course, people are all careful about injecting things into UNIVERSAL:: because they know that, so you're pretty much safe, right?
Wrong.
Just because you don't get a clash at that level, you're effectively making the assumption that no class your program will load, ever, will define a method of that name with a different meaning! So if you have a pretty printer class -
package PrettyPrinter;
sub debug_print {
my ($self, @to_print) = @_;
die "no arguments passed to debug_print" unless @to_print;
...
then as soon as you try
$obj->debug_print
on your pretty printer ... exception time!
The place I encountered this one personally was with UNIVERSAL::moniker combined with Template::Plugin::Class - the latter returns a proxy object that catches all unknown method calls directed to it with an AUTOLOAD (which I considered mentioning for this article but is only really a footshot ... unless somebody else already put one in the global namespace). However, this proxy object only accounts for the standard UNIVERSAL methods, so moniker gets called on the proxy object and fails spectacularly to be any use. The solution, of course, was to inject an explicit moniker method into the proxy object's class, but that's an evil workaround in and of itself. However, provided you know what you're doing a sparse few extra global methods can be very powerful, so it's arguably the least scary of the techniques
Headshot rating: May take an ear off.
Headshot Two: Wrapping or replacing other people's methods
This technique, also known as monkey patching, is a common means of in-place modification of other people's code to allow tweaking without requiring everything deals with a factory. This -can- in theory be a very useful tool, although you should always try to subclass - but, sometimes (for e.g. with my Template::Plugin problem above) you either don't have time or don't have an option to integrate patches upstream or ...
So, anyway, a first attempt might be
sub MonkeyPatch::TargetClass::target_method { ... }
which is ok in theory except ... (1) your sub is declared in the current package so your class globals won't be right,and (2) if the other class ever declared a target_method method itself it'll all go boom. Now, neither matched for my problem above, and if they do perl will give you a handy 'method redefined' warning so you know you did something silly, so this is only really a shooting yourself in the foot moment (I don't think every language has this warning though, it was added because of perl programmers shooting themselves in the head due to its absence years back ... sometimes being old and unfashionable is an advantage :)
So. Let's assume for the moment there -is- a target method and you want to wrap it. Well, so, I guess the easiest way is just to rename the old one and replace it, right?
package MonkeyPatch::TargetClass;
*old_target_method = \&target_method; # set symbol to subref
{
no warnings 'redefine'; # only turn the warning off within the current block
*target_method = sub {
my $self = shift;
...
$self->old_target_method(@_);
}
}
Ok, that's fine. But what if somebody -else- tries to wrap this the same way? They'll overwrite old_target_method with your wrapped version and ... BOOM. Alright, so can we wrap it in place? Well, in perl, yes we can -
package MonkeyPatch::TargetClass;
my $old = \&target_method;
{
no warnings 'redefine';
*target_method = sub {
my $self = shift;
...
$self->$old(@_);
}
So now if somebody else does the same thing, they'll grab our redefined version and wrap that and there are no conflicts - well, so long as our wrappers aren't order-dependent, but at that point you're beyond what this approach can ever handle and it's time to break out multiple inheritance and C3 mro support.
But ... what if we don't know if the method's defined in TargetClass or one of its superclasses? Worse still, what if it -is- currently in TargetClass but an update to the library that comes from factors it out into a base class? You guessed it: BOOM.
So, one last tweak -
becomesmy $old = &target_method;
my $old = __PACKAGE__->can('target_method'); # look up along MRO
and now we are, so far as this sort of hack ever can be, safe and robust - to the point where I was happy to use this trick to tweak bits of CPAN for my Catalyst installer.
So, this one gets a vote of: bloody hard to get right, still dangerous even then, but sometimes useful.
Headshot rating: All fun and games until somebody loses an eye
Headshot Three: Source filtering
And finally, the most strange and fascinating of the bunch, and the one which a client of ours got shot in the face by in the wild inspiring me to write this article in the first place. Source filtering is ... well, for those of you unfamiliar with perl the implementation you're likely to have encountered is C preprocessor macros; I don't -think- any other dynamic languages have been foolish enough to copy this particular feature and if the author of your preferred one proposes doing so, please break his fingers and confiscate his keyboard until the madness passes.
So, right, source filters allow you to provide a textual preprocessor to attach to your source code which runs before the file is fed to the interpreter's parser. Which theoretically allows you to extend the language to do -anything-, for example add a completely pointless switch statement (you can produce something almost identical with other syntactic constructs - if anybody cares comment and I'll elaborate).
It also allows you to shoot yourself in the head. Spectacularly. Observe -
package ShootMeInTheHead;
use strict;
use warnings;
use Switch;=head1 ShootMeInTheHead
=cut
sub foo {
my ($variable) = @_;
return $variable + 1;
}sub bar {
my ($variable) = @_;
return $variab1e + 1;
}
A very simple piece of code, and one with a visible typo - the last return line has a 1 in 'variab1e' which should be an l. And lo, asking perl to compile it yields
Global symbol "$variab1e" requires explicit package name at line 18.
and thus informed of our silly error we may swiftly correct it. Except. Consider the following identically buggy but subtly different file -
package ShootMeInTheHead;
use strict;
use warnings;
use Switch;
# load switch statement=head1 ShootMeInTheHead
=cut
sub foo {
my ($variable) = @_;
return $variable + 1;
}sub bar {
my ($variable) = @_;
return $variab1e + 1;
}
which, while -functionally- identical (in theory), now contains the magic word 'switch'. Now, since we added a line, the compilation error should now be line nine, right?
Global symbol "$variab1e" requires explicit package name at line 14.
Erm. But that's the "return $variable + 1;" line further up. Which is -correct-.
Which is where we originally came in, with one of Shadowcat's clients contacting us in a not very happy state of mind because their core app wouldn't start and their devs swore blind the line the compiler error was coming from was correct. And, indeed, they were right - that line was. A quick comment-out of the 'use Switch' and dependent code revealed the correct line (some 200 lines off, their code was a tad more involved than this), I corrected a typo in a variable name, and lo, the app was working again.
I'll confess to not having tracked down -exactly- how Switch's tweaks to line numbering manage to mangle the error reporting, and I'll also confess to not particularly caring - my fundamental advice to anybody considering using a source filter that alters things beyond a single line is, and will ever be, don't.
Headshot rating: Right between the eyes.
So, having scared myself and perhaps a number of my readers too, I think I'm going to go for a beer and some cat macros while I recover. Adieu!
So, weeks and weeks on from when I planned, I'm finally getting down to sorting my e-mail out. First thing is to pick an IMAP client library. A quick poke for IMAP client libraries on CPAN reveals a few, notably IMAP::Client, Net::IMAP and Net::IMAP::Simple. Of the three, IMAP::Client looks like the most 'engineered' but I don't really feel the need for anything complex just yet, so I'm going to go with Net::IMAP::Simple.
Since I currently have a -lot- of crap in my inbox, I'm going to grab the unread mail and shuffle it into a mailbox called 'scratch' that I can back up, screw up and generally mess with to my heart's content. I'm eliding how I do that because (a) it's boring cp -a stuff at a shell prompt (b) it's specific to the slightly odd qmail setup I built several years ago, which means it's likely useless to anybody reading this.
INBOX/scratch created and populate, it's time to fire up ye olde repl and see what we get (NB: throughout this transcript I'm using ' 8 ' instead of '@' in order to throw off spambots.
$ use Net::IMAP::Simple;
$ my $imap = Net::IMAP::Simple->new('imap.scsys.co.uk');
Net::IMAP::Simple=HASH(0x8256c1c)
$ $imap->login('matt 8 trout.me.uk', 'notmyrealpw');
1;
$ $imap->select('INBOX/scratch');
16202
Riiight. Well, so far so good but that's a lot of e-mail to deal with. Looking at the Net::IMAP::Simple synopsis, my best route to poke at it is to make an Email::Simple object from the headers, so we'll load that and build a quick function that handles doing that for a given msgid.
$ use Email::Simple;
Net::IMAP::Simple=HASH(0x8256c1c)
$ my $mk_hdr = sub { Email::Simple->new(join '', @{$_[0]->top($_[1])}); };
CODE(0x85ac270)
$ my $msg = $mk_hdr->($imap, 1);
Email::Simple=HASH(0x86a1b0c)
$ $msg->header('Delivered-To');
matt 8 trout.me.uk infrastructures 8 trout.me.uk
$ join(',',$msg->header('Delivered-To'));
matt 8 trout.me.uk,infrastructures 8 trout.me.uk
Top. Not only does it work, but we get a list back. Now, my plan is to redirect based on the mail address delivered to for the moment - while yes, I could easily do this by rejigging my mailbox layout I can't be bothered pre-processing old mail and I know there's a mail server migration in my near future so I want my filtering approach to be independent of any of that. Hence why I'm doing things this way - what I'm trying to end up with is a portable version of the set of mail rules I had last time I attempted to use Thunderbird. Anyway. Delivered-To headers are the easiest way of getting hold of this info, but if it went through a remote network it'll already have one from there and I have no desire to classify based on other people's mail addresses, so we need the last one on my own domain. Time for another quick function, taking advantage of the wonderful perl built-in 'reverse' -
$ my $last_local = sub { foreach(reverse $_[0]->header('Delivered-To')) { return $_ if /\@trout.me.uk$/; } };
CODE(0x8548e90)
$ $last_local->($msg);
infrastructures 8 trout.me.uk
$ my @first_ten = map { $mk_hdr->($imap, $_) } 1..10;
Email::Simple=HASH(0x86a2730) Email::Simple=HASH(0x86a66d0) Email::Simple=HASH(0
x86aad00) Email::Simple=HASH(0x86ae078) Email::Simple=HASH(0x86ae72c) Email::Sim
ple=HASH(0x86b98dc) Email::Simple=HASH(0x86cb79c) Email::Simple=HASH(0x86d2660)
Email::Simple=HASH(0x86d2cb4) Email::Simple=HASH(0x86d3440)
$ join("\n", map { $last_local->($_) } @first_ten);
infrastructures 8 trout.me.uk
perl-stuff 8 trout.me.uk
dbix-class 8 trout.me.uk
dbix-class 8 trout.me.uk
utp 8 trout.me.uk
dbix-class 8 trout.me.uk
perl-stuff 8 trout.me.uk
dbix-class 8 trout.me.uk
wiqi 8 trout.me.uk
matt 8 trout.me.uk
Ok, that seems to be giving us the information I needed. So, now to build a simple classifier per-delivered-to. Declaring two hashes, one to map mail parts to targets and one to store unknown results, and build a classifier subroutine that pulls off the part before the @ and checks the target hash for an entry -
$ my %message_targets; my %unknown;
$ @message_targets{qw/infrastructures dbix-class perl-stuff matt utp wiqi/} = qw/lists lists default personal junk junk/;
$ my $classify = sub { $_[0] =~ /(.*?)\@/; return $_[1]->{$1}; };
CODE(0x86a1c5c)
$ $classify->('perl-stuff 8 trout.me.uk',\%message_targets);
default
$ join("\n", map { $classify->($last_local->($_), \%message_targets) } @first_ten);
lists
default
lists
lists
junk
lists
default
lists
junk
personal
Now we build a function that actually puts it all together (somewhat reformatted for clarity albeit I typed it straight in on a single line when I first did it) - this takes a range of message indexes, pulls the headers, finds the local address and does the classify run. If it gets a result it calls a $do_cl function to do something with the classification, if not increments the unknown count for the original address. The reason for keeping the $do_cl part separate is I can initially define it to be something for debugging only -
$ my $do_cl;
my $cl_range = sub {
foreach my $i ($_[0] .. $_[1]) {
my $m = $mk_hdr->($imap, $i);
my $l = $last_local->($m);
my $c = $classify->($l, \%message_targets);
if (defined $c) { $do_cl->($i, $c); } else { $unknown{$l}++; }
}
};
CODE(0x868a4a0)
$ $do_cl = sub { print join(' => ', @_)."\n"; };
CODE(0x86a6034)(check on 1-20 elided)
$ $cl_range->(21, 30);
21 => lists
23 => lists
24 => lists
25 => lists
26 => default
28 => lists
29 => lists
30 => lists
$ %unknown;
lxxyj 8 trout.me.uk 1 ide-hard-disk-compswap 8 trout.me.uk 1 aauzf 8 trout.me.uk 1
So, the classifier seems to work. So now all we need to do is create corresponding folders, modify $do_cl, and we're away -
$ $do_cl = sub {
print join(' => ', @_)."\n";
$imap->copy($_[0], "INBOX/".$_[1]) || warn $imap->errstr;
$imap->delete($_[0]) || warn $imap->errstr;
};
$ my $max_id = $imap->select('INBOX/scratch');
$ $cl_range->(1, $max_id); $imap->expunge_mailbox('INBOX/scratch');
Letting this process finish gets me about 12k messages classified and 500 or so addresses in %unknown, so at this point I resort to actually turning it into a real script - something I was able to do pretty much exactly by copy and pasting from the repl history. The only addition is a more compact form for specifying the targets -
my %target_conf = (
# MAILING LISTS
lists => [ qw(infrastructures dbix-class ...
) ],# PERSONAL ADDRESSES
personal => [ qw(matt matthew paypal ...
) ],# additional target lists elided
);
%message_targets = (
map { my $v = $_; (
map { ($_ => $v) } @{$target_conf{$_}}
) } keys %target_conf
);
where the outer map saves off the key as $v so the inner map can use the arrayref stashed in the conf hash to produce key: arrayref entry, value: $v pairs for the target list the code operates on. I considered externalising thse into a conf file but since the qw() format allows me to use arbitrary whitespace for formatting so the config's only about 60 lines, which for software that's basically single-user (and developed in a couple hours) it really isn't worth the hassle.
It processes about 2.5k messages per hour over my DSL connection and seems to keep both the client machine and the IMAP server box reasonably busy while it's doing it. I could probably improve this but again for my purposes, who cares. The big point for me was going from "mailbox is screwed" to "mailbox is usable and can be cleaed out easily" in the space of an afternoon :)
And with that, I'm off to bed. I hope this was of some vague interest to those of you who got all the way through it!
(this is a continuation of the series started in this post, so you may want to start there)
New tool for today: Rocco Caputo's amazingly handy Lexical::Persistence module. We're going to use this to persist lexical variables - i.e. those declared with 'my $var' - between invocations. But before we can do that, we need to break Devel::REPL out a bit and move it from doing a simple string eval on the supplied line to building a subroutine reference from it. So, -now- we change $self->execute($line) in run_once to call $self->eval($line) instead, and expand the eval code as follows -
sub eval {
my ($self, $line) = @_;
my ($to_exec, @rest) = $self->compile($line);
return @rest unless defined($to_exec);
my @ret = $self->execute($to_exec);
return @ret;
}sub compile {
my $_REPL = shift;
my $compiled = eval $_REPL->wrap_as_sub($_[0]);
return (undef, $_REPL->error_return("Compile error", $@)) if $@;
return $compiled;
}sub wrap_as_sub {
my ($self, $line) = @_;
return qq!sub {\n!.$self->mangle_line($line).qq!\n}\n!;
}sub mangle_line {
my ($self, $line) = @_;
return $line;
}sub execute {
my ($self, $to_exec, @args) = @_;
my @ret = eval { $to_exec->(@args) };
return $self->error_return("Runtime error", $@) if $@;
return @ret;
}sub error_return {
my ($self, $type, $error) = @_;
return "${type}: ${error}";
}
The end result here is that we end up with '1 + 1;' in the REPL becoming "sub {\n1 +1;\n}\n" before execution; since perl treats the final expression in a sub as an implicit return if no explicit one is present, everything continues to work as before - but we now have a bunch of extra hooks with which to modify the execution flow (we'll need the hook offered by mangle_line today, and the error_return one will come in handy for printing backtraces and similar should we desire those later).
The reason for the odd setup in 'sub compile' is to ensure only the $_REPL variable is in scope when the compilation takes place; this means that we have any other variable name to ourself in the code being executed (and accidental specification of any other variable without declaring it will cause a compile-time error).
So, main code refactored, on with Devel::REPL::Plugin::LexEnv -
package Devel::REPL::Plugin::LexEnv;
use Moose::Role;
use namespace::clean -except => [ 'meta' ];
use Lexical::Persistence;has 'lexical_environment' => (
isa => 'Lexical::Persistence',
is => 'rw',
required => 1,
lazy => 1,
default => sub { Lexical::Persistence->new }
);
The -except in namespace::clean is new, and actually indicates a mistake I made in the first article - fortunately, since Devel::REPL itself inherits a meta method from Moose::Object it didn't actually break anything, but in the case of a plugin we can't unimport it since a role isn't a class and can't have a superclass. I'm using a slightly more verbose formatting for the attribute for clarity; anybody who prefers one or t'other should leave a comment to that effect. Also, note that the isa type is actually referencing a type constraint (see Moose::Util::TypeConstraints for the list of standard ones and the functions to create custom ones), which since it doesn't exist Moose automatically creates for us as a subtype of Object which requires that the value passes ->isa('Lexical::Persistence'). Now for the meat -
around 'mangle_line' => sub {
my $orig = shift;
my ($self, @rest) = @_;
my $line = $self->$orig(@rest);
my $lp = $self->lexical_environment;
return join('', map { "my $_;\n" } keys %{$lp->get_context('_')}).$line;
};around 'execute' => sub {
my $orig = shift;
my ($self, $to_exec, @rest) = @_;
my $wrapped = $self->lexical_environment->wrap($to_exec);
return $self->$orig($wrapped, @rest);
};
Ok, this is slightly involved for not many lines of code. So in order to make it clearer and ease your mind that I'm still sane, we'll examine it in reverse order.
return $self->$orig($wrapped, @rest);
Ok, by this point we've got a subroutine reference that's been wrapped by the Lexical::Persistence context, so we pass that off to the base Devel::REPL execute to deal with normally (note @rest currently doesn't do anything - passing it around all over the place is basically a politeness in case another plugin author wants to use it).
my $wrapped = $self->lexical_environment->wrap($to_exec);
This is the bit where Lexical::Persistence does its magic. It wraps the subroutine reference built by the $repl->compile process in code that fills out the lexical environment of the subroutine from the data it has stored - so if the $lp has data for a $foo variable and it sees a 'my $foo' lexical in the subref, it sets the value of that lexical to its data for it before execution - and then afterwards, it goes through the lexicals' current values and saves them away. Which means -
$ my $foo = 3;
3
$ $foo + 1;
4
$
- that the variables we've declared are now persistent between lines executed within the repl, even though each line becomes a separate subref with its own independent lexical environment and namespace.
Which begs the question, why did the second line compile? Without the plugin we'd simply get an error saying $foo isn't declared, since the code is still compiled under 'use strict'. The magic for this is in the first around, in this line:
return join('', map { "my $_;\n" } keys %{$lp->get_context('_')}).$line;
which pulls out all the keys from the default context (which currently is the only one we're using, and called '_' since other contexts in Lexical::Persistence are keyed by variable name prefix) and constructs declarations from them which are prepended onto the line during mangling and before the subref is compiled. To see what I mean by this a bit more clearly, I'll declare a couple more variables and then replicate the behaviour (getting $lp directly to save space)
$ my $foo = 3;
3
$ my @bar = (1, 2, 3);
1 2 3
$ my $baz = 'spoon!';
spoon!
$ join('', map { "my $_;\n" } keys %{$_REPL->lexical_environment->get_context('_')});
my $foo;
my $baz;
my @bar;$
and also, here we see one other useful behaviour - variable names starting with '_' aren't persisted by default, thus avoiding us needing to worry about $_REPL being accidentally stomped on by Lexical::Persistence. You should probably have a read of the full documentation to get the big picture, though, along with PadWalker and Devel::LexAlias if you want to understand how all this is implemented under the hood. And, of course, the complete final code to the plugin.
Here endeth part 3, two days late due to a combination of extreme tiredness one night and having to wait for my laptop to dry out after being caught in heavy rain on the walk home the next. I'm not entirely sure what I'll be writing about next time round, but hopefully a few of you will turn up to find out anyway. Later ...
So, last time we got from scratch to a basically working simple perl REPL. So now it's time to start writing plugins, since currently it's disturbingly basic and only really useful for ... well, for making a blog post about how it works at all :)
First, we should probably provide some history so we take advantage of the readline goodness available. This is, fortunately, going to be trivially simple, but I'm going to digress slightly before that and explain how our plugin system works. When you call the ->load_plugin method on the repl object with a plugin name, it searches for an installed module called Devel::REPL::Plugin::NameOfPlugin, which it expects to be a Moose Role. The basic concept of a role is 'a mixin on steroids' - not only can you provide methods but method modifiers, which are AOP-style before/after/around advice functions. If none of that made any sense, you'll see it in action in a moment.
Once the plugin has been loaded, it's composed into the current -object-, not the current class. Devel::REPL itself is unchanged, what MooseX::Object::Pluggable does is automatically create an anonymous subclass and move the Devel::REPL object into that (similar to ruby's singleton objects, I believe), then adds the role to that. This means we can have more than one object of our class with different plugins attached - probably not that useful just yet for a REPL but a nice feature to have, especially since I plan to start doing Devel::REPL development from within Devel::REPL as soon as I can.
Now, Term::Readline gives you automagic history handling, but it'd be really nice to have a way to display history lines ourselves and to fire previous commands quickly without hammering the up key. So, without further ado, it's time for Devel::REPL::Plugin::History. Here's lib/Devel/REPL/Plugin/History.pm -
package Devel::REPL::Plugin::History;
use Moose::Role;
has 'history' => (
isa => 'ArrayRef', is => 'rw', required => 1, lazy => 1,
default => sub { [] }
);sub push_history {
my ($self, $line) = @_;
push(@{$self->history}, $line);
}
Again, 'use Moose::Role' saves us needing strict and warnings. Roles can add attributes too, so we define our history attribute here to hold the history line list.
This time we -do- set a type constraint because it's important that everything can assume the history is an array reference - although there's no need to specify what the array contains. The one interesting part is specifying lazy on the attribute - were this a normal class or role there'd be no reason for it since the default value isn't dependent on anything else, but since this is a plugin designed to be added to an object -after- new() (construction) time the default wouldn't fire, so instead we make it lazy so it gets called on first get, just like out_fh in Devel::REPL itself.
The push_history method is pretty trivial, the @{$self->history} deferences the arrayref so 'push' can shove the line onto the end. The principle reason for making this a standalone method is to provide a hook for other plugin authors to use to wrap the process - maybe they'll want to log the history to a file as well, or throw away lines once there's 500 in the history (which bash does in most default configs).
Now we need to hook this method into the processing, and add a little magic for some extra syntax -
around 'read' => sub {
my $orig = shift;
my ($self, @args) = @_;
my $line = $self->$orig(@args);
The 'around' type of method modifier wraps all calls to the method of the same name in the class itself (there's also an 'override' type which provides some extra sugar but is designed for classes, not roles). It's special in that it provides an extra argument on the front which is a reference to the original method, so we shift that off the argument list first (list operations in perl default to @_ to save us typing for this sort of thing). Then, we grab the object and any arguments, pulling them into an @args list since we neither know nor care what they are, and call the original method to get the line. The $self->$orig trick is one of the beauties of perl - you can provide a variable containing -either- a method name or a reference to the sub implementing the method and perl Does The Right Thing (tm).
if (defined $line) {
if ($line =~ m/^!(.*)$/) {
my $call = $1;
$line = $self->history_call($call);
if (defined $line) {
$self->print($line."\n");
} else {
return "'Unable to find ${call} in history'";
}
}
if ($line =~ m/\S/) {
$self->push_history($line);
}
}
if ($line =~ m/\S/) {
$self->push_history($line);
}
}
return $line;
};
If the line's undef we're about to leave the repl, so we skip our processing. Then we check for !-style history requests and if so we filter the line -without- the ! through the history_call method and print the line that's been substituted from the history for reference if present, or return an error string if not. Next, if the line contains anything but whitespace (\S inverts the \s whitespace regexp match group) we add it to the history before returning it.
Note the semicolon after the closing } - this is required with method modifier declarations to end the statement since they aren't normal perl sub definitions - in fact they're implemented in terms of a sub call, so theoretically could take extra arguments albeit those provided by Moose don't.
Finally, we need to implement history_call -
sub history_call {
my ($self, $call) = @_;
if ($call =~ m/^(-?\d+)$/) { # handle !1 or !-1
my $idx = $1;
$idx-- if ($idx > 0); # !1 gets history element 0
my $line = $self->history->[$idx];
return $line;
}
my $re = qr/^\Q${call}\E/;
foreach my $line (reverse @{$self->history}) {
return $line if ($line =~ $re);
}
return;
}
The only interesting bit in the first part is the $idx-- if ($idx > 0) statement, since perl arrays can already handle
$x[2]; # third element
$x[-2]; # last but one element
but of course the tradition in shells is !1 should be the first history line rather than !0. $self->history->[$idx] then uses this, taking advantage of arrayref derefence via -> along the way.
Then the second part handles the !search style (I could have put this in an else but I prefer to have something for a sub to fall through to in case of weirdness and at least one return to be at top level - stylistic only); qr is 'quote regex' which makes $re a re-usable compiled regex object, done here for clarity rather than the usual performance reasons, \Q and \E disable and then enable regex metacharacters so a . or * in $call isn't misinterpreted.
So, not bothering to document the final 1; for a second time, here's a session with the plugin -
cain$ perl -Ilib -MDevel::REPL -e 'my $repl = Devel::REPL->new; $repl->load_plugin("History"); $repl->run;'
$ 1;
1
$ int(1.5);
1
$ (1..3);
1 2 3
$ !-1
(1..3);
1 2 3
$ !1
1;
1
$ !-30
Unable to find -30 in history
$ !int
int(1.5);
1
$ !foo
Unable to find foo in history
... so, check in the completed code, and I think we're about done for this iteration. Next time we'll look at adding lexical persistence so 'my' variables persist between calls. And maybe I'll finally get as far as sorting out my e-mail :)
=== update, April 25
For the impatient, part 2 and part 3 are already out and I'm aiming to publish a part per week until I run out of ideas and change to a different topic.
=== end update, original post follows
So, I need to sort out my personal e-mail - I've left it alone for a few weeks and it's accumulated >10k messages. Forwarding it to gmail sort of works, but I'm a die-hard mutt user. I also prefer doing mail access over IMAP, so things like procmail aren't spectacularly useful. Plus I find procmail even uglier than the worst perl I've ever seen, so ... no.
Which means I need to script mail classification and filtering over IMAP. Which means I need an easy way to experiment with the various CPAN IMAP modules without repeatedly fetching the header list. Which means a repl - read-eval-print loop, basically an interactive shell for $language_of_choice, ideally, so I can prat around interactively and make my mistakes in an environment where it's not going to screw me entirely when I get it wrong.
Now, ok, there are two already on CPAN. Great. Except Shell::Perl uses package variables to persist data between lines (think namespaced globals) and App::REPL is somewhat baroque and really, really wants to use bright colours everywhere. And I'm an old fvwm2-loving curmudgeon who really hates colourisation. Plus I really want to make something that's nice and easy to extend, which means I want to use the meta-object-orientation goodness of Moose and the runtime plugin facilities of the MooseX::Object::Pluggable role. Soo, sod it. I'll write a new one, and explain what I'm doing and why as I go along as an attempt to justify the level to which this is now yak shaving
Name first. Easy. Devel::REPL, because (a) it isn't taken and (b) a REPL to me is very very much a development tool, so it's a reasonably sane namespace. The actual script is going to be called re.pl, mostly because I can and because it amuses me. Last tools to mention before I start - Term::Readline, which will do the heavy lifting of handling readline capabilities, and namespace::clean which will let me clear out any helper functions I import from my classes so I can inherit methods of the same name without breaking anything.
First, setup the dist directory and open the module file -
cain$ mkdir Devel-REPL
cain$ cd Devel-REPL
cain$ mkdir -p lib/Devel
cain$ vi lib/Devel/REPL.pm
Declare the package (class) name and load the tools I need -
package Devel::REPL;
use Term::ReadLine;
use Moose;
use namespace::clean;
Note that I don't need to explicitly ask for 'strict' and 'warnings' as is normal at the top of a perl file - Moose does this automatically. namespace::clean comes last because it examines the package's namespace at the point it's use'd to figure out what to clean out afterwards - so far, just the stuff that came from Moose but there could easily be more later. I don't need to declare a base class because I get the standard Moose::Object but I do need to load the Pluggable role to get the load_plugin goodness -
with 'MooseX::Object::Pluggable';
Now, according to the Term::ReadLine synopsis, which handily is an extremely primitive REPL in and of itself, I'm going to need at least a term object, a prompt string and an output filehandle, so let's declare those as attributes -
has 'term' => (
is => 'rw', required => 1,
default => sub { Term::ReadLine->new('Perl REPL') }
);has 'prompt' => (
is => 'rw', required => 1,
default => sub { '$ ' }
);has 'out_fh' => (
is => 'rw', required => 1, lazy => 1,
default => sub { shift->term->OUT || \*STDOUT; }
);
The 'rw' means I'll get a getter/setter accessor type for each of these, required prevents them accidentally being set to something undefined, and I've made the 'out_fh' attribute lazy so that it can rely on being defaulted -after- the object's constructed so the call to 'term' will work. I could have set restrictions on what types the values provided to these attributes are by providing the 'isa' option to the has calls but I can't see any advantage to it right now and I might want to pass something odd in for interesting purposes later.
Next step is to create an initial runloop that calls on read, execute and print steps (why execute and not eval? we'll get to that in part 2 :) -
sub run {
my ($self) = @_;
while ($self->run_once) {
# keep looping
}
}sub run_once {
my ($self) = @_;
my $line = $self->read;
return unless defined($line); # undefined value == EOF
my @ret = $self->execute($line);
$self->print(@ret);
return 1;
}
Separating out run and run_once may seem largely pointless at this stage, but later on we may want to hook some sort of action to happen before every step - say incrementing a counter in the prompt (or something more interesting once I think of it :). I imagine a bunch of you are probably muttering 'yagni yagni yagni' under your breath, so in turn I'd like those of you who -are- to imagine me sticking my tongue out at you. Ok, we done now? Good.
run_once itself does, pretty much literally, read then execute then print. The only wrinkle is the return if $line is undefined; traditionally perl filehandles of any sort return the special value undef to indicate EOF, since '' or '0' both evaluate to false but are perfectly valid lines to read (even if they make no sense to the app reading them), and Term::ReadLine behaves just the same. Then at the end if we got that far, we return 1 to indicate success to run() so execution continues.
Of course, we still haven't actually defined the read, execute and print steps, so let's do that now.
sub read {
my ($self) = @_;
return $self->term->readline($self->prompt);
}
Simple enough; term and prompt are both stock accessors so calling them with no arguments returns the value - to set we'd call $self->term($new_term) or similar. Moose will happily let you create separate get_term and set_term methods via the 'reader' and 'writer' options to has, but it's not usual and it's more typing so I'm not going to.
sub execute {
my ($self, $to_exec) = @_;
my @ret = eval $to_exec;
@ret = ("ERROR: $@") if $@;
return @ret;
}
eval is used here in string mode to compile+execute at the same time - this currently means that all code is executing in the Devel::REPL namespace, which we don't really want but it'll do for a start. The return is made in list context in case the code's returning multiple values - it's unlikely to do any harm and having to put [] round code returning more than one thing would be -annoying-. A quick check of $@ afterwards for compile or execution errors and we're good to go.
sub print {
my ($self, @ret) = @_;
my $fh = $self->out_fh;
print $fh "@ret";
}1;
And now we can grab the appropriate filehandle (which will call the lazy default => sub the first time we ask for it) and print the output. Yay. The 1; at the end of the file indicates to perl that the .pm loading ok. So, check syntax -
cain$ perl -c lib/Devel/REPL.pm
lib/Devel/REPL.pm syntax OK
and try running the code (-Ilib tells perl to search 'lib' in the local dir, the -M loads the module and -e provides the code to execute since we don't have a script yet) -
cain$ perl -Ilib -MDevel::REPL -e 'Devel::REPL->new->run;'
$ 2 + 4
6
$ (1 .. 3)
1 2 3
$
cain$
And it lives, it evaluates, and Ctrl-D sends EOF and brings me back to the shell prompt. Lovely. So, a quick svk add + commit later, the first working code is in the repository.
Next time round I'll sort out history and add the first plugin - one to provide a persistant lexical environment so we can carry variables between lines without polluting the Devel::REPL namespace or giving up the joys of compile-time typo checking from 'use strict'. I'll see you there.
=== update, April 25
History handling turned out to be more interesting than I first expected so it and the plugin approach got part 2 all to themselves. Lexical environment handling is now covered by part 3
=== end update, original post follows
