ft SYSHH#7

In this episode of songs you should have heard, let's have some metal, shall we? In particular, the Devin Townsend Band. Some people are flooding youtube with reaction videos to Devin's performance of Kingdom, which is a good one, but let's have this one instead:

Devin Townsend Band — Accelerated Evolution — Deadhead

Posted Mon 12 Oct 2020 02:05:01 CEST Tags:

One of the esoteric programming languages that a fair number of people have heard of, is “Whitespace”. I'm sure this has nothing at all to do with jokes like its source code being so fantastically efficient when being printed out. The language was actually meant as a joke (at least that's how one of its creators puts it when he mentions the language in some of his talks). But even though people are aware of the language's existence, they rarely know how it works. Let's change that, because it's really not all that hard.

At its core, the language describes operations on a stack machine, that has access to a heap. The only data type of the machine are integer values of arbitrary size. There are operations for the machine, that use these values to encode characters as well. And that's about it. The rest is just a peculiar way to encode operations and values using three ASCII whitespace characters (TAB, SPACE and LINEFEED).

Operations are grouped by function. These groups determine an operation's encoding prefix, that the language spec calls “Instruction Modification Parameter”. Many of the operations have no arguments, as they create and consume arguments on the machine's stack. Some however do take an argument: Integers and Labels. Label arguments are used by flow control operations; and integer arguments are used by some of the stack manipulation operations.

Arithmetic Operation: Integer Division

Their encoding is similar: Both use strings of spaces and tabs, that are terminated by linefeeds. In labels, the spaces and tabs have no special semantics. At least not within the language specification; more on that later. In integers, tabs encode ones and spaces encode zeroes. Something to note about such integer literals is that they do not use two's complement to encode negative numbers. Instead, they use the literal's leftmost bit as a signedness bit: Tab means negative number, space means positive number. That makes encoding arbitrarily wide integers straight-forward.

Integer Literals

When you take a look at actual whitespace programs, you'll sometimes notice extremely long labels. Oftentimes with that, there seems to be a silent convention to use chunks of eight characters to encode eight bits (same semantics as in number literals as to what characters encode ones and zeroes) that are turned into a positive number, which is then mapped to the ASCII encoding (seven would have sufficed, but that's not what the programs I've seen use).

When you try and implement this language, you'll notice a couple of things your machine implementation needs: A stack obviously, since whitespace is a stack-manipulating language. Another stack, used as a callstack, since the language has Call and Return operations. It also needs a heap, mapping addresses to integers. Finally you'll need program memory and a program counter register. You might want a jump-table too, to deal with translating labels to addresses. That's not strictly required, though: You could just translate all labels to addresses before loading the program into your machine.

When I digged deep enough into the language spec to figure this out, I was intrigued enough to actually do yet another implementation of the language. It's called SpaceMan and it is available at gitlab.com/ft/spaceman as well as github.com/ft/spaceman.

I've added an org-mode conversion of the original language homepage, because that one is currently only available via archive.org. When trying some of the more complex examples you can find on the net, I was running into problems. My implementation failed to even parse them. I was verifying my code for quite some time, until I concluded that it was implementing the parser correctly. So I looked at other implementations. And it turned out most of them implemented two additional stack-manipulating operations: Copy and Slide. Apparently, they were added to a later specification of the language. I couldn't find such a spec on the net, though (not that I invested a lot of time — see the update at the end of the post for a resolution to this). However, after implementing these two, spaceman could run the most elaborate examples that I could find online, like a sudoku solver. I've added those two additional operations to the included version of the language spec.

I'm using Megaparsec for parsing purposes. And with a couple of utilities put in place, writing the parser becomes rather pleasant:

stackParser :: Parser StackOperation
stackParser = do
  (      try $ imp [ space ])
  (      try $ Push  <$> number [ space ])
    <|> (try $ operation        [ linefeed, space    ] Duplicate)
    <|> (try $ operation        [ linefeed, tabular  ] Swap)
    <|> (try $ operation        [ linefeed, linefeed ] Drop)
    <|> (try $ Copy  <$> number [ tabular,  space    ])
    <|> (try $ Slide <$> number [ tabular,  linefeed ])

When implementing the language's operations, you'll find that you're facing lots of common instructions that manipulate the virtual machine. You put those common tasks into functions, of course, and like any designer of an assembly language worth their salt, you obviously give your instructions slightly cryptic three letter names. With those, implementing the stack-manipulating operations looks like this:

eval :: WhitespaceMachine -> StackOperation -> IO WhitespaceMachine
eval m (Push n)  = return $ pci $ psh [n] m
eval m Duplicate = return $ pci $ psh h m               where h     = peek 1 m
eval m Swap      = return $ pci $ psh [b,a] $ drp 2 m   where [a,b] = peek 2 m
eval m Drop      = return $ pci $ drp 1 m
eval m (Copy i)  = return $ pci $ psh [n] m             where n = ref  i m
eval m (Slide n) = return $ pci $ psh h $ drp (n+1) m   where h = peek 1 m

Implementing the other groups of operations looks similar. I sort of like it. Each of them would basically fit onto an overhead slide.

As it turns out, editing whitespace programs is tough work. Doing it directly is best done in an hex-editor. But spaceman has a feature, that makes it dump the programs syntax-tree to stdout. And those program dumps are actually executable programs. So if you'd like to edit a whitespace program, you can dump it into a file, edit its AST and then run that program to yield the changed whitespace program.

“Yet another whitespace implemention created” achievement unlocked, I guess.

[Update] Andrew Archibald informs me, that if you spend a little more time looking for the language specification, that includes Slide and Copy, you will find http://compsoc.dur.ac.uk/whitespace/index.php (via archive.org), that contains v0.3 of the whitespace specification, that adds these operations.

Posted Sun 11 Oct 2020 21:16:20 CEST Tags:

Not too many programming languages have the concept of first-class continuations built into them. Scheme is one that does1, so it will be our vehicle for today's adventure. Now what really is a continuation? In order to get an idea, consider the following simple expression:

(+ 5 4)

Remember that Scheme uses prefix notation2, where the first item in a parenthesised list is the operator, that is applied to the list of arguments, which is the rest of the list. + is the sum-operator, so the above expression evaluates to 9. So far, so good. How does this help with continuations? Well, a continuation is a way of bundling up “what's left to do” with respect to an arbitrary sub-expression within a program.

Imagine the perspective of the literal 5 in the expression above. If you are this five, look around you and try to come up with a function, that would represent the rest of the computation after you are done evaluating3. So in the expression above, after 5 evaluates to five, what is left to be done in order to finally evaluate to 9? Here's a suggestion:

(define (kont v)
  (+ v 4))

Indeed, after evaluating the 5 literal, the rest of the expression can be represented as a function, that takes one value and adds 4 to it. And if you stick the value of 5 into that function, it will return 9, like it must.

Once you realise that you need to take the perspective of a part of an expression in order to figure out what the continuation does, it is not hard to figure out these functions for arbitrarily complex expressions. It just takes a little bit of practice. But thus far, this is just a theoretical idea. How would you work with continuations? How to you get to them?

In Scheme (and also Standard ML), there is an operator called call/cc, which is short for call-with-current-continuation. Its purpose is to capture the continuation at a certain point in a program and make it available to the programmer. The name call-with-* suggests, that the operator needs something that it can call. That would be a function. This function has to take one argument and call/cc will call that function with the continuation it captured as the function's only argument. This function parameter is the only argument that call/cc expects and the return value of the call/cc invocation is the value of the the supplied function applied to the captured continuation. Here are simple examples, that use call/cc:

(call/cc (lambda (k) 5)) → 5
(+ (call/cc (lambda (k) 5)) 4) → 9

Since these don't use the continuation at all, it is straight forward to tell the values these expressions evaluate to. Now for something more interesting: Since Scheme allows us to mutate memory4, we can just store the continuation in a parameter for later use:

(define kont #f)

(+ (call/cc (lambda (k)
              ;; Store the continuation k in the global
              ;; parameter kont, that we defined before.
              (set! kont k)
              ;; Ignore the continuation otherwise, and
              ;; just return 5; so this expression boils
              ;; down to (+ 5 4); though we used call/cc
              ;; to extract the continuation from the
              ;; perspective of our value 5.
              5))
   4) → 9

;; Take a look at ‘kont’ — this is how Racket pretty prints it.
kont → #<continuation>

(kont 10) → 14

Remember when we defined a function called kont earlier? We figured out that the continuation right after evaluating the 5 is a function that adds 4 to its one and only argument. Now applying the continuation (stored in kont) to a value of 10 results in: 14 — just as we predicted.

The captured continuation has a special property, that sets it apart from being a simple function: When it is invoked, it resurrects its execution context entirely and abandons the current one, no matter where the program went in the meantime. This property can be used to implement all sorts of useful high-level control flow facilities. A simple, easily understood one is “early exit”. Consider this:

(fold * (*) '(1 2 3 4 0 6 7 8 9 10)) → 0

This is an expression that computes the product of all values in the list, that is the last argument to fold5. Since any integer being multiplied by zero results in zero, the computation could stop at the fifth element of the input list. However, fold has no concept of exiting early. Since we have call/cc, we can easily add that:

(call/cc (lambda (return)
            (fold (lambda (x acc)
                    (if (zero? x)
                      (return 0)
                      (* x acc)))
                  (*)
                  '(1 2 3 4 0 6 7 8 9 10)))) → 0

It's the same expression as before, but it's executed inside the function, that is passed to call/cc. That way, we have access to the surrounding continuation. In the call to fold we pass a function that's more complex than the direct reference to multiplication function. fold will hand the current element of the list as the first argument of its function argument and the current value of its accumulator as the second one. What we need to do is straight-forward: Check if the current argument is zero, and if that is the case, just invoke the surrounding continuation with 0 as its argument. As mentioned earlier, this resurrects the continuations execution context and abandons the current one.

Now in order to see if you understood continuations, at this point in other texts on the subject, you get confronted with mind-benders like the following, which is from TSPL page 646:

(((call/cc identity) identity) "Hey!")

If you evaluate this at a Scheme prompt, it will return the string "Hey!". This might be guessable, but the question is: Why? Let's analyse the expression's form first: At its most primitive, this expression can be summarised like this:

(operator operand)

The operand has the value "Hey!", trivilly. And since we know the result of the final evaluation is that same value, we can (barring side effects) reason that operator has to be the identity function7. Since that function appears twice in the original expression, it feels like we are on the right track! A guess would be, that the second incarnation of identity somehow gets transposed into the operator spot. Maybe we can just replace it with something else, like string-upcase. But if you try, you will notice that things are not quite as easy is they seemed8:

(((call/cc identity) string-upcase) "Hey!")
→ string-upcase: contract violation
    expected: string?
    given: #<procedure:string-upcase>

…and not "HEY!" like one might have guessed. Time to take a look at the expression, that is our operand:

((call/cc identity) identity)

With our previous strategy we need to look at (op identity), put ourselves in the perspective of op and ask: What function represents the rest of the computation after I am done evaluating myself? And you can follow that strategy pretty mechanically:

(define (k v)
  (v identity))

This is what gets handed in as the argument to identity in (call/cc identity). And since it is identity, all it does is return the continuation it is handed. That is all it does. Which means, we end up with this:

((lambda (op) (op identity)) identity)
;; Which can be reduced to:
(identity identity)
;; Which can be further reduced to:
identity

Let's come back to our guess-work from earlier, where we naïvely used string-upcase instead of identity: It would reduce to this: (string-upcase string-upcase). And now all of the sudden the error message makes sense too: It says string-upcase expected a string argument, but instead got a procedure; and not any procedure but string-upcase itself.

To summarise the above: The call/cc uses its identity argument to feed the right identity into itself, ultimately returning itself. So (((call/cc identity) identity) "Hey!") reduces to (identity "Hey!") which evaluates to "Hey!".

It's a fair bit mind-bendy if you think it all through, but managable if you follow the strategy that was presented earlier. What remains to examine is what continuations are useful for beyond clever puzzles. Well, they are a primitive that allows the implementation of a lot of high-level control structures such as early-exit (as we have seen earlier), exceptions, co-routines, generators, and more. In a language with first-class continuations all these features can be implemented in a library without further language support.

Finally, let's not overlook that there are problems with full, unbounded continuations as well: For details on that, see [Kis12]. To alleviate most of these concerns, we might take a look at delimited continuations. That is, however, a story for another day.

Footnotes

1 See [R4RS] p.28f
2 See [SICP] p.8f
3 Number literals evaluate to themselves — but in general, the subexpression that would be picked could be arbitrarily complex.
4 I know, I know.
5 The * symbol evaluates to the multiplication function and (*) evaluates to the identity element of multiplication, namely 1.
6 According to the author it is “probably the most confusing Scheme program of its size”, a Confusion Desity Maximum, if you will.
7 (lambda (x) x)
8 As we will see later, this intuition is not completely wrong, but there is a twist!

References

[SICP] Harold Abelson and Gerald J. Sussman. Structure and Interpretation of Computer Programs. 2nd ed. The MIT Press, 1996. isbn: 0262011530.

[R4RS] William Clinger, Jonathan Rees, et al. R4RS, The revised4 report on the algorithmic language Scheme. New York, NY, USA: ACM Lisp Pointers, 1991, pp. 1–55.

[TSPL] R. Kent Dybvig. The Scheme Programming Language. 4th ed. MIT Press, 2009, pp. I–XII, 1–491. isbn: 978-0-262-51298-5.

[Kis12] Oleg Kiselyov. An argument against call/cc. 2012. url: http://okmij.org/ftp/continuations/against-callcc.html.

Posted Tue 01 Oct 2019 22:23:24 CEST Tags:
ft SYSHH#6

Haven't done one of these in a while. So here goes…

Nine Inch Nails — And All That Could Have Been — Leaving Hope

NIN have a couple of amazing instrumental tracks on their résumé, some very quiet ones too, and it's hard to pick a favourite. But if pressed for an answer, I'd pick this.

Posted Sun 20 Jan 2019 16:59:26 CET Tags:

After being the most horrible package maintainer imaginable for at least the past two years, I've been putting some work into the Debian package for the MRA/MDA combination that is fdm. That package now captures a new version control snapshot from upstream (releases happen seldom, the project is pretty much in maintenance only mode), that fixes a couple of things and adds some features, like being able to use SSL with Google's IMAP without turning off all kinds of verifications.

I've been using different iterations of the new package for a while now and am experiencing zero issues in day to day use. So even though this is the first package update in a while, and a couple of things (like the build system) have changed upstream; I'm fairly confident this is an improvement over the previous versions of the Debian package.

Internally, the packaging is pretty much a complete rewrite. It uses modern debhelper features and implements pretty much all items on Debian's wish list (like machine readable debian/copyright format and stuff like that).

Thanks go out to Axel, who patiently reviewed my changes and helped me get this package out of the door before buster goes into freeze. This update would not have happened without him.

Posted Sun 20 Jan 2019 15:26:50 CET Tags:

In my dayjob, I'm an electrical engineer. Though, I'm mostly involved in writing firmware that runs on baremetal systems. If you're designing electrical circuits for a living or just tinkered with them, you will have heard of E-Series preferred values. In particular, resistors, capacitors and inductors are generally available in ratings derived from these numbers. The basic idea is to fit a number of values into one order of magnitude. For examples in E3, there are three values in the same decimal power: 100, 220 and 470. The next value would be 1000 in the next decimal power. Likewise in E12 there are twelve such values.

Roughly, E-Series follow an exponential definition. The series generally don't follow the exact mathematical expression. For an actual implementation you either adjust to the specified values or you simply use the actual tables from the spec.

Now looking up values in tables is a boring task; especially if the tables are relatively small. Finding combinations, though, is more interesting. For example, if you'd like a resistor rated at 50Ω (which is a rather important value in radio frequency design), you'll notice that the exact value is not available in any of the E-Series. But it's easy to combine two 100Ω resistors in parallel to produce a value of 50Ω. …sure, in an actual design you might use 49.9Ω from E96 or E192, or use a specially made resistor of the desired rating. But you get the idea. Combining components in parallel and series circuits allows the parts from E-Series to cover lots of ground.

I've written a library that implements E-Series in Scheme. Its main modules are (e-series adjacency), which allows looking up values from an E-Series that are in the vicinity of a given value. Then there's (e-series combine) which produces combinations of values from a certain E-Series to approximate a given value. And finally there's the top-level (e-series) module, that implements frontends to the other mentioned modules, to make it possible to easily use the library at a Scheme REPL.

To see if the library finds a combination from E12 that matches 50Ω:

scheme@(guile-user)> (resistor 12 50)
 ------------+-------------------------+-------------+-------------+--------------
    Desired  |         Actual (Error)  |     Part A  |     Part B  |  Circuit
 ------------+-------------------------+-------------+-------------+--------------
   50.0000Ω  |   50.0000Ω (  exact  )  |   100.000Ω  |   100.000Ω  |  parallel
   50.0000Ω  |   50.0380Ω (+7.605E-4)  |   56.0000Ω  |   470.000Ω  |  parallel
   50.0000Ω  |   49.7000Ω (-6.000E-3)  |   47.0000Ω  |   2.70000Ω  |  series
   50.0000Ω  |   50.3000Ω (+6.000E-3)  |   47.0000Ω  |   3.30000Ω  |  series
 ------------+-------------------------+-------------+-------------+--------------

Those aren't all the combinations that are possible. By default the module produces tables, that contain combinations that match the desired value at least as well as 1%. Now, to see values in the vicinity of 50Ω all E-Series, you can do this:

scheme@(guile-user)> (resistor 50)
 ---------+--------------------------+-------------+--------------------------
  Series  |           Below (Error)  |      Exact  |           Above (Error)
 ---------+--------------------------+-------------+--------------------------
    E3    |   47.0000Ω  (-6.000E-2)  |             |   100.000Ω  (+1.000E+0)
    E6    |   47.0000Ω  (-6.000E-2)  |             |   68.0000Ω  (+3.600E-1)
    E12   |   47.0000Ω  (-6.000E-2)  |             |   56.0000Ω  (+1.200E-1)
    E24   |   47.0000Ω  (-6.000E-2)  |             |   51.0000Ω  (+2.000E-2)
    E48   |   48.7000Ω  (-2.600E-2)  |             |   51.1000Ω  (+2.200E-2)
    E96   |   49.9000Ω  (-2.000E-3)  |             |   51.1000Ω  (+2.200E-2)
    E192  |   49.9000Ω  (-2.000E-3)  |             |   50.5000Ω  (+1.000E-2)
 ---------+--------------------------+-------------+--------------------------

And here you see that E96 and E192 have pretty close matches as single components.

With combinations, the library allows the for user to specify arbitrarily complex predicates to choose from the generated combinations. For example, to only return parallel circuits that approximate 50Ω from E12:

scheme@(guile-user)> (resistor 12 50 #:predicate (circuit 'parallel))
...

And to limit those results to those that have an eror of 0.001 or better, here's a way:

scheme@(guile-user)> (resistor 12 50 #:predicate (all-of (max-error 1e-3)
                                                         (circuit 'parallel)))
 ------------+-------------------------+-------------+-------------+--------------
    Desired  |         Actual (Error)  |     Part A  |     Part B  |  Circuit
 ------------+-------------------------+-------------+-------------+--------------
   50.0000Ω  |   50.0000Ω (  exact  )  |   100.000Ω  |   100.000Ω  |  parallel
   50.0000Ω  |   50.0380Ω (+7.605E-4)  |   56.0000Ω  |   470.000Ω  |  parallel
 ------------+-------------------------+-------------+-------------+--------------

There are frontends for inductors and capacitors as well, so you don't have to mentally strain yourself too much about with combination corresponds to which circuit. To find a 9.54μF capacitor approximation from E24 with an error better than 0.002:

scheme@(guile-user)> (capacitor 24 #e9.54e-6 #:predicate (max-error #e2e-3))
 ------------+-------------------------+-------------+-------------+--------------
    Desired  |         Actual (Error)  |     Part A  |     Part B  |  Circuit
 ------------+-------------------------+-------------+-------------+--------------
  9.54000µF  |  9.53000µF (-1.048E-3)  |  9.10000µF  |  430.000nF  |  parallel
  9.54000µF  |  9.55102µF (+1.155E-3)  |  13.0000µF  |  36.0000µF  |  series
  9.54000µF  |  9.52381µF (-1.697E-3)  |  10.0000µF  |  200.000µF  |  series
 ------------+-------------------------+-------------+-------------+--------------

The test-suite and documentation coverage could be better, but the front-end module is easy enough to use, I think.

Posted Sun 09 Dec 2018 23:21:36 CET Tags:

At this point, Scheme is my favourite language to program in. And my prefered implementation (…there are a lot) is GNU Guile. I've written a couple of pieces of software using it: Bindings for termios on POSIX Systems, An implementation of Linear Feedback Shift Registers, A unit test framework that emits TAP output, A system that produces time-sheets from Redmine Timing exports, An elaborate system to describe and experiment with configurable electronic parts and native implementation of the XMMS2 protocol and client library. That latter I've written about before.

Now with Guile there's a lot going on. The project as a small but dedicated team of developers. And they have been working on improving the system performance by a lot. For the longest time, Guile only had an interpreter. The last version of that was 1.8 — the version I came in contact with for the first time as well. Shortly after I started to dabble, 2.0 was released so I didn't have a lot of exposure to 1.8. Version 2.0 added a stack VM that Guile byte-compiled its code to. In 2017 the project released version 2.2, which saw the stack VM replaced by a register VM. And that improved performance dramatically. And now, they are preparing a 3.0 release (2.9.1 beta release happened on October the 10th of this year), which adds just-in-time native code generation on x86-64 via GNU Lightning.

I used xmms2-guile to get an idea about the improvements that the different Guile versions offer. I couldn't be arsed to build myself a 1.8 release so, this won't look at that. The setup is a minimal client connecting to an XMMS2 server, pull out the information required to display a playlist of about 15300 tracks. All of this is running on an Intel Core i7 with plenty of memory, running XMMS2 version 0.8 DrO_o (git commit: 3bd868a9).

As a point of comparison, I ran XMMS2's command line client to do the same task. It took a mean time of 10.77 seconds. That's about 1420 tracks processed per second. I'm pretty sure the client does a little more than it needs to, but this establishes a ballpark.

With Guile 2.0 the whole affair takes 23.9 seconds. That's about 640 tracks per second. Quite a bit slower. More than twice as slow. I was sure this would be slower than the official client, but the difference is disappointing. So what about 2.2 — can it do better?

Well, in Guile 2.2 the same job takes 6.38 seconds. 2400 tracks per second! Impressive. This is why I think the official CLI client implemented in C does a little too much work than it needs. It shouldn't lose to this, and if it would, it shouldn't be behind this much. My xmms2-guile client library is a native implementation. It doesn't wrap the official XMMS2 client library that's written in C. I'm way too lazy to do detailed timing analyses of the different libraries and Scheme implementations. The results are reproducible, though.

Now what about the upcoming Guile 3.0? I've used 2.9.1.3-1f678 to be precise. It should execute a little faster, but the 2.2 results are pretty good already. Running the same setup yields a mean time of 5.87 seconds. A bit over 2600 tracks per second.

I mean sure, this is not an in-depth analysis, there's a local network connection involved and it's a rather specific repetitive task. Still though. Pretty cool when a random test shows that 3.0 is four times as fast as 2.0. ;)

Posted Tue 13 Nov 2018 00:55:34 CET Tags:

I've been lazy on the arename side of things. Life. Jobs. And lots of other interests… But I guess about five and a half years is a good a time as any to release another version.

This version is mostly a bugfix release, but it also contains one major change in the way that add-on code is handled. Previously, the idea was to copy and paste stuff from examples to .arename.hooks. It was a bad idea from the start but that's how it is with programs that start out as a quick hack.

Anyway. Starting with v4.1, all previous examples (and one or the other new one) are now bundled with arename as Perl modules in the ARename::… name space. All of them contain fairly complete documentation, which users can access using perldoc:

% perldoc ARename::ReplaceSpaces

Using them is straight forward:

use ARename::ReplaceSpaces;
ARename::ReplaceSpaces::register();

The new EXTENSIONS section in the manual (man arename) briefly describes the included add-ons; and the full details are available — as mentioned above — via perldoc.

Something else that sucks a little is that github removed a feature from their service in the meantime, that allowed projects to upload tar-balls for releases. That is gone entirely, which breaks arename releases a little bit. Previously, I would generate tarballs that contained pre-build modules, manual pages etc… that's gone now, so though luck.

If you are a distributor in need of such a tarball do make release to get a tarball named arename-4.1.tar.gz.

In the future, I might include a VERSION file as a fallback for when the project is not built from a git repository. That would make automatically generated tarballs from release tags via github's web interface more useful, I guess. But for now, that's what it is.

Posted Mon 24 Jul 2017 23:40:05 CEST Tags:

“…done in Scheme.” was an idea I had when I started liking Scheme more and more. XMMS2 is my preferred audio player for a long time now. And I always wanted to write a client for it that fits my frame of mind better than the available ones.

I started writing one in C and that was okay. But when you're used to use properly extensible applications on a regular basis, you kind of want your audio player to have at least some of those abilities as well. I started adding a lua interpreter (which I didn't like), a Perl interpreter (which I did before in another application, but which is also not a lot of fun). So I threw it all away, setting out to write one in Scheme from scratch.

To interface the XMMS2 server, I was first trying to write a library that wrapped the C library that the XMMS2 project ships. But now I was back to writing C, which I didn't want to do. Someone on IRC in #xmms2 on freenode suggested to just implement the whole protocol in Scheme natively. I was a little intimidated by that, because the XMMS2 server supports a ton of IPC calls. But XMMS2 also ships with machine readable definitions of all of those, which means that you can generate most of the code and once you've implemented the networking part and have the serialization and de-serialization for the protocol's data types, you're pretty much set. …well, after you've implemented the code that generates your IPC code from XMMS2's definition file.

Most of XMMS2's protocol data types map very well to Scheme. There are strings, integers, floating point numbers, lists, dictionaries. All very natural in Scheme.

And then there are Collections. Collections are a way to interact with the XMMS2 server's media library. You can query the media library using collections. You can generate play lists using collections and perform a lot of operations on them like sorting them in a way you can specify yourself. For more information about collections see the Collections Concept page in the XMMS2 wiki.

Now internally, Collections are basically a tree structure, that may be nested arbitrarily. It carries a couple of payload data sets, but they are trees. And implementing a tree in Scheme is not all that hard either. The serialization and de-serialization is also pretty straight forward, since the protocol reuses its own data types to represent the collection data.

What is not quite so cool is the language you end up with to express these collections. Say, you want to create a collection that matches four Thrash Metal groups you can do that with XMMS2's command line client like so:

xmms2 coll create big-four artist:Slayer \
                        OR artist:Metallica \
                        OR artist:Anthrax \
                        OR artist:Megadeth

To create the same collection with my Scheme library, that would look like this:

(make-collection COLLECTION-TYPE-UNION
    '() '()
    (list (make-collection COLLECTION-TYPE-EQUALS
              '((field . "artist")
                (value . "Slayer"))
              '()
              (list (make-collection COLLECTION-TYPE-UNIVERSE '() '() '())))
          (make-collection COLLECTION-TYPE-EQUALS
              '((field . "artist")
                (value . "Metallica"))
              '()
              (list (make-collection COLLECTION-TYPE-UNIVERSE '() '() '())))
          (make-collection COLLECTION-TYPE-EQUALS
              '((field . "artist")
                (value . "Anthrax"))
              '()
              (list (make-collection COLLECTION-TYPE-UNIVERSE '() '() '())))
          (make-collection COLLECTION-TYPE-EQUALS
              '((field . "artist")
                (value . "Megadeth"))
              '()
              (list (make-collection COLLECTION-TYPE-UNIVERSE '() '() '())))))

…and isn't that just awful? Yes, yes it is. It so is.

In order to reign in this craziness, the library ships a macro that implements a little domain specific language to express collections with. Using that, the above boils down to this:

(collection (∪ (artist = Slayer)
               (artist = Metallica)
               (artist = Anthrax)
               (artist = Megadeth)))

So much better, right? …well, unless you really don't like Unicode characters and the ‘∪’ in there gives you a constant headache… But worry not, you can also use ‘or’ in place if the union symbol if you like. Or ‘UNION’ if you really want to be clear about things.

If you know a bit of Scheme, you may wonder how to use arguments to those operators that get evaluated at run time. Since evidently, the Slayer in there is turned into a "Slayer" string at compile time; same goes for the artist symbol. These transformations are done to make the language very compact for users to just type expressions. If you want an operator to be evaluated, you have to wrap it in a (| ...) expression:

(let ((key "artist") (value "Slayer"))
  (collection ((| key) = (| value)))

These expressions may be arbitrarily complex.

Finally, to traverse Collection tree data structures, the library ships a function called ‘collection-fold’. It implements pre, post and level tree traversal with both left-to-right and right-to-left directions. So, if you'd like to count the number of nodes in a collection tree, you can do it like this:

(define *big-four* (collection (∪ (artist = Slayer)
                                  (artist = Metallica)
                                  (artist = Anthrax)
                                  (artist = Megadeth)))

(collection-fold + 0 *big-four*)   ;; → 9

This would evaluate to 9.

The library is still at an early stage, but it can control an XMMS2 server as the ‘cli’ example, shipped with the library, will show you. There are no high level primitives to support synchronous and asynchronous server interactions. And there is not a whole lot of documentation, yet either. But the library implements all data types the protocol supports as well as all methods, signals and broadcasts the IPC document defines. The collection DSL supports all collection operators and all attributes one might want to supply.

Feel free to take a look, play around, report bugs. The library's home is with my account on github.

[Update] In a previous version of this post I mixed up the terminology for unions and intersections. The library mixed up the set operators associated with ‘and’ and ‘or’, too, and was fixed as well.

Posted Sun 15 Jan 2017 15:52:39 CET Tags:

Hello dewi users! (Yes, all three of you.)

I started developing dewi in 2010. Back then I did two releases in less than a week. And now the third release gets out more than six years later. There are a number of reasons for the amount of time this took: I made a few mistakes in dewi's design, and fixing them took some effort. Spare time got really scarce. I wasted some time implementing features, that should never have been part of dewi (because they were features, my previous solution to this problem had). I wanted lots of features, that took a while to finish. Because the new release would definitely break backwards compatibility, I was working on a separate branch called “onward”, that basically only I used. And finally, since the code in that “onward” branch worked for me (I knew its shortcomings and worked around them), I could not motivate myself to fix all the outstanding issues and add proper documentation for all that had changed. Therefore nothing happened.

In the meantime, other systems like vcsh, rcm or homesick have emerged. Really, there are tons of systems like these out there, these are just a few of them that get mentioned to me more often than the others. Some of these systems hard depend on a version control system — usually git — and even have it at their absolute centre of implementation. Others just made pretty weird design decisions, that make me feel uneasy. And yet others are just a little limited in what they can actually do (basically just symlink a file). I still prefer dewi over everything I have looked at in any sort of detail.

Earlier, I mentioned some mistakes in dewi's design. Let's take a look at some of them: First, I should have never depended on ‘make’: Dewi only used it to call the dewi script (from .dewi/bin) and because the "make deploy" looks pretty. But there's no feature in make that the basic dewi features actually need. And then you just end up with a bunch of downsides, like:

  • Superfluous dependency, that a target system has to satisfy
  • Proper portable make code is actually not that easy, and I always get issues on OpenBSD systems for example.
  • Using make required a Makefile in each dewified sub-directories to exist. Which means that you pollute the sub-directory's version control history with changes to that file each time dewi updates it.
  • If someone wants to use make as part of their deployment system, they can use it on top of a make-less dewi-system at will.
  • Needlessly complicates the whole system.

You can think of more. It was just wrong to use ‘make’.

And the second major mis-design was the case of keeping the code for dewi in the dot-dewi sub-directory. The idea was to ensure, that the deployment process doesn't ever break due to changes in dewi that might be in updated system version. That's still a moot point however, since nobody keeps someone from installing a fixed version of dewi into ‘~/bin’ and never ever change it. Dewi can be used from its source directory as well and its non-standard dependencies (dewi's implementation language, Perl, is installed on virtually all systems I have come across) are optional dependencies; no need to install anything. Instead, keeping the script in the ‘.dewi/’ directory made the system much less clear and much more confusing.

These issues are both fixed now. Dewi is now a single command line tool, that ships all its code. And where you previously did this:

% make deploy

…you are now doing this:

% dewi deploy

To discuss all changes in dewi since version 0.2 would completely go beyond the scope of a single blog post. Instead, let me list a few highlights:

  • You can now apply filters when deploying files (useful, if your configuration files carry sensitive information).
  • You can now use multiple input files to deploy into a single destination file.
  • Dewi now lets you run your own code while it works, because it lets you define hooks.
  • Dewi now has better support for deploying large, recursive trees of files.
  • You can now deploy files with full support of templates using Perl's Template module (available from CPAN).

Another thing that may be interesting for a few users is the trivial call of its register() function: That one takes a string argument, that is the name of a source file. Its effect would be to deploy that file using the copy method, after the source file name was turned into a dot-file-name. Copying was the default choice of deployment, because it absolutely works everywhere.

However, sometimes you may want to take changes you do to source files immediately after saving them to disk. You may think “symbolic links!” right now and you would be right. The trivial call style of register() now allows the user to supply string-options. So now the canonical example of a Dewifile for a git-configuration that should be symlinked to its source looks like this:

register '--symlink', 'gitconfig';
end

For all details, take a look at the CHANGES and UPGRADING files as well as the new and updated manuals dewi(7), dewi(1) and dewifile(5) (see below).

I am planning to release version 1.0 of dewi based on version 0.3 once it had time to settle a little and have remaining bugs fixed, if they come up, and to have its documentation be polished further. I consider dewi to be pretty feature complete right now, which is why after 1.0 it'll go to maintenance mode.

Thanks for listening. ☺

Related links:

Posted Fri 29 Jul 2016 01:31:03 CEST Tags:

You may have seen this already, but I thought I'd mention it anyway, because I like the level of detail some people put into their work. Like with this list of easter eggs that made it into episode one of season two of Mr.Robot. Once you make it the the website with the in-browser operating system with a working window manager, a small working shell and a cute impression of an IRC client, you may agree.

Posted Sun 17 Jul 2016 12:04:28 CEST Tags:

If you're using git with zsh, you may have seen this:

% git show HEAD^1
zsh: no matches found: HEAD^1

…and wondered what in the name of all that is holy is going on.

The reason behind this is, that zsh signals an error if you provide it with a pattern, that doesn't match any files. This is the sane thing to do. Especially in interactive shells. If your shell behaves differently, I am sorry.

Your reaction might be “What pattern are you speaking of?” — since there is no asterisk, question mark or similar construct in there. That is true, but zsh has pretty powerful extended patterns, if enabled. They let you to pull off lots of things you might resort to regular expressions for, in certain situations. Characters like “#”, “~” and “^” become special characters and must be quoted if their special meaning is to be suppressed.

So that is what's going on. And sometimes git users get annoyed.

With git, though, you almost never want your shell to do any globbing anyway. Seriously. On the one hand, that is because git more often than not interacts with its database rather than with files, so files don't come into play to start with. And on the other hand, even when you pass stuff to git that looks like a file pattern, often you want git to expand the pattern itself. Like with “git grep”:

% git grep 'foo.*bar' -- '*.h'

This finds all lines that match the regular expression “foo.*bar” in all files that match the pattern “*.h”. No matter in which subdirectory the header file is hidden in.

Git is really one of the most eligible commands to have globbing (the expansion of patterns in file names) turned off for. And zsh let's you do that:

alias git='noglob git'

Now when you execute commands like these:

% git show HEAD^1
% git grep foo.*bar -- *.h

…zsh will not expand any patterns for list of files and instead hands all arguments over to git unchanged. Which is exactly what you want almost every time.

In the odd case, when you really do want the shell to do globbing, you can circumvent your shell from using the alias you just defined: In case you've got the EQUALS option set (which is the default) you can do:

% =git add *.txt

Try “echo =git” to see why that works.

If you don't use that particular option, you can always turn off alias expansion temporarily by prefixing the alias name with a backslash:

% \git add *.txt

And that's that.

Posted Mon 20 Jun 2016 21:55:38 CEST Tags: