by Joachim Breitner (mail@joachim-breitner.de) at May 13, 2012 03:46 PM
by Joachim Breitner (mail@joachim-breitner.de) at May 13, 2012 03:46 PM
Today, it must have been exactly 10 years that I started using Debian. The story of how I came to Debian shows some of its strengths, so I’ll use this occasion to share it.
I spent the first half of 2002 as an high-school exchange student in Wenatchee, USA. I was already a user of Linux at that time: I made my first contact roughly in 1996 and did my first installation at home two years later, but all that time I was dual-booting and my main system was a well-arranged Windows 98. The machine was a regular tower PC, but nevertheless I put the computer into my trunk when I flew to the US. It took away most of the space, and I had to put some of my cloths inside the case.
So I was there, happily using my Windows and my manually set up “Linux From Scratch” until one day the inevitable happened; inevitable at least until you start doing backups: On April 30th, my hard drive crashed, and took the two systems together with 4 years of personal data with it.
Two weeks later I had a new hard drive and was pondering my options. I did plan to install Windows again; at that time Windows XP was just released. But I wanted a German version of Windows, which would be hard to get there. Also, I did not want to use Linux from Scratch any more, and wanted to make a well-founded choice of a distribution. On the other hand, I really wanted to get my machine up and running quickly, to be able to read my mail more comfortably. I had heard that Debian had good support for network installations (downloading a full 700MB CD was something to avoid at that time), so I grabbed some netinst images, burned a CD, and quickly installed Debian.
I was planning to use the system for about two weeks. I did not pay any particular attention to the setup. Heck, I even picked from my Simpsons sidekick machine naming scheme one that I would not miss being used up (“barney”). Nevertheless, I was using this installation for many years (and many upgrades), until I eventually switched to using laptops. In fact, that very installation is still on the machine somewhere and works. I did install Windows XP a few weeks later as well, but hardly used it. So May 12th of 2002 was when I turned into a full-time Linux and Debian user.
I soon became interested in Debian the project and started to contribute. But that is another story for another ten year anniversary blog post, most likely on October 21, 2013...
by nomeata (mail@joachim-breitner.de) at May 12, 2012 10:00 AM
I must say that I do like free groups. At least whenever I play around with some theorem provers, I find myself formalizing free groups in them. For Isabelle, my development of free groups is already part of the Archive of Formal Proofs. Now I became interested in the theorem prover/programming language Agda,so I did it there as well. I was curious how well Agda is suited for doing math, and how comfortable with intuitionalistic logic I’d be.
At first I wanted to follow the same path again and tried to define the free group on the set of fully reduced words. This is the natural way in Isabelle, where the existing setup for groups expects you to define the carrier as a subset of an existing type (the type here being lists of generators and their inverses). But I did not get far, and also I had to start using stuff like DecidableEquivalence, an indication that this might not go well with the intuitionalistic logic. So I changed my approach and defined the free group on all words as elements of the group, with a suitable equivalence relation. This allowed me define the free group construction and show its group properties without any smell of classical logic.
The agda files can be found in my darcs repository, and the HTML export can be browsed: Generators.agda defines the sets-of-generators-and-inverses and FreeGroups.agda (parametrized by the Setoid it is defined over) the reduction relation and the group axioms. Here are some observations I (disclaimer: Agda-beginer) made:
If I were to extend this theory, there are two important facts to be shown: That there is a unique reduced word in every equivalence class (norm_form_uniq), and the universal property of the free group. For the former (started in NormalForm.agda) I’m missing some general lemmas about relations (e.g. that local confluence implies global confluence, and even the reflexive, symmetric, transitive hull is missing in the standard library). For the latter, some general notions such as a group homomorphism need to be developed first.
I planned to compare the two developments, Isabelle and Agda. But as they turned out to show quite things in different orders, this is not really possible any more. One motivation to look at Agda was to see if a dependently typed language frees me from doing lots of set-element-checking (see the “mems” lemma in the Isabelle proof of the Ping-Pong-Lemma). So far I had no such problems, but I did not get far enough yet to actually tell.
Thanks to Helmut Grohne for an educating evening of Agda hacking!
by nomeata (mail@joachim-breitner.de) at May 07, 2012 01:24 PM

by Joachim Breitner (mail@joachim-breitner.de) at April 03, 2012 09:41 AM
by Joachim Breitner (mail@joachim-breitner.de) at March 27, 2012 04:23 PM
Today was the last day of my vacation, which I spent in Amsterdam. I only had a little bit of time to walk through the Grachtengordel (canals) past the Bloemenmarkt (flower market). One can buy all sorts of specialty flower bulbs there.
On the way, I came across a specialized store which basically just sells three different types of gouda: young, medium, and extra-old. What a great idea! Of course I immediately fell for it and left with a noticeably heavier bag.
I finally ended up in Amsterdam’s shady corner, which was certainly interesting. As predicted by Olesya, I saw several school classes exploring the area. Crazy!

On Thursday, I took the train to Bruges in the topmost corner of Belgium. This gem of a city was once a very rich and thriving trade outpost. Almost all of the medieval architecture in Bruges is intact to this day, which seems unbelievable. It is truly gorgeous — I had trouble deciding where to point my camera.
This place of course attracts many visitors. I would not recommend going there on the weekend.
During the last week, I created ghc-heap-view, a library to investigate the actual memory representation of Haskell values. It is inspired by vacuum and the GHCi debugger, but goes beyond them by allowing the user to look inside thunks and functions and see what other values they refer to. Let me demonstrate it by running the included demo:
Here are a four different lists, where the first three are already evaluated.
The first one, l, was defined as a top level constant as
> l = [1,2,3]
and is now found at 0x00000000006d1750/2 (where the /2 is the pointer tag information) and fully evaluated:
ConsClosure {info = StgInfoTable {ptrs = 2, nptrs = 0, tipe = CONSTR_STATIC, srtlen = 1},
ptrArgs = [0x00000000006d16e0/1,0x00000000006d1730/2],
dataArgs = [], descr = "ghc-prim:GHC.Types.:"}
The second one, l2, is locally defined
> let l2 = 4:l
and now found at 0x00007fdce19fe4b0/2. See how the cons-cell references l!
ConsClosure {info = StgInfoTable {ptrs = 2, nptrs = 0, tipe = CONSTR_2_0, srtlen = 1},
ptrArgs = [0x00000000006dca50/1,0x00000000006d1750/2],
dataArgs = [],
descr = "ghc-prim:GHC.Types.:"}
And the binding
> args <- map length `fmap` getArgs
evaluates to the “one”, global empty list at 0x00000000006db640/1:
ConsClosure {info = StgInfoTable {ptrs = 0, nptrs = 0, tipe = CONSTR_NOCAF_STATIC, srtlen = 0},
ptrArgs = [],
dataArgs = [],
descr = "ghc-prim:GHC.Types.[]"}
And now we have, at 0x00007fdce19fe4c8, the concatenation of them, but unevaluated:
> let x = l ++ l2 ++ args
The thunk keeps a reference to l2 and args, but not l, as that is at a static address, unless you are running this in GHCi:
ThunkClosure {info = StgInfoTable {ptrs = 2, nptrs = 0, tipe = THUNK_2_0, srtlen = 1},
ptrArgs = [0x00007fdce19fe4b0/2,0x00000000006db640/1],
dataArgs = []}
Now to some more closure types. m and m' locally bound of type the unboxed type Int#, with values 42 resp. 23.
> let f = \x n -> take (I# m + I# x) n ++ args t = f m' l2
So here is (0x00007fdce1937d50/2), referencing its free variables args and 42:
FunClosure {info = StgInfoTable {ptrs = 1, nptrs = 1, tipe = FUN_1_1, srtlen = 65553},
ptrArgs = [0x00000000006db640/1],
dataArgs = [42]}
And t is a thunk that applies f (also referenced here) to an unboxed value (23) and l2:
ThunkClosure {info = StgInfoTable {ptrs = 2, nptrs = 1, tipe = THUNK, srtlen = 0},
ptrArgs = [0x00007fdce19fe4b0/2,0x00007fdce1937d50/2],
dataArgs = [23]}
Lastly, here is the standard example for self reference:
> let x = id (:) () x
This is what x (0x00007fdce1947940) looks like, at least without -O:
ThunkClosure {info = StgInfoTable {ptrs = 0, nptrs = 0, tipe = THUNK, srtlen = 1},
ptrArgs = [],
dataArgs = []}
So it is unevaluated. Let us evaluate it using seq. Now we have, still at 0x00007fdce1947940:
IndClosure {info = StgInfoTable {ptrs = 1, nptrs = 0, tipe = BLACKHOLE, srtlen = 0},
indirectee = 0x00007fdce194cc98/2}
The thunk was replaced by an indirection. If we look at the target, 0x00007fdce194cc98/2, we see that it is a newly created cons-cell referencing the original location of x:
ConsClosure {info = StgInfoTable {ptrs = 2, nptrs = 0, tipe = CONSTR_2_0, srtlen = 1},
ptrArgs = [0x00000000006db620/1,0x00007fdce1947940],
dataArgs = [],
descr = "ghc-prim:GHC.Types.:"}
After running the garbage collector (performGC), we find that the address of x is now 0x00007fdce19f30d0/2 and that the self-reference is without indirections:
ConsClosure {info = StgInfoTable {ptrs = 2, nptrs = 0, tipe = CONSTR_2_0, srtlen = 1},
ptrArgs = [0x00000000006db620/1,0x00007fdce19f30d0/2],
dataArgs = [],
descr = "ghc-prim:GHC.Types.:"}
The output of ghc-heap-view is not really pretty yet; even the indentation in this blog post was added manually by me, so this really needs a pretty printer providing a nicer, possibly more compact representation, including something like what vacuum provides. Maybe vacuum can be ported to use this library, and also include the thunk’s and function’s references in the output. Maybe also the GHCi debugger can be extended to show more information about unevaluated expressions using this. Internally, the library is not very polished yet either. It only handles those closures types that I have seen so far, and is likely to break horribly if run in a threaded or debugging enabled runtime.
Obviously, this is not standard Haskell 98 code, but rather deep trickery involving the GHC API and some C code. Initially I tried to use the API that vacuum and the GHCi debugger rely on, which is an operation
unpackClosure# :: a -> (# Addr#, Array# b, ByteArray# #)
which takes any Haskell value and returns the address to its info table, the pointers and the non-pointer-data in the closure. Unfortunately, it was not complete in that it was meant only for data closures and will for other closure types, e.g. thunks, return no data and no pointers (as can be seen in the code). So I implemented my own version of this operation:
foreign import prim "slurpClosurezh" slurpClosure# :: Any -> (# Addr#, ByteArray#, Array# b #)
where the returned ByteArray# contains the complete closure, including extra information fields such as the arity of a function. The Array# is again the list of pointers in the closure. At first glance, this is a duplication, as the pointers are of course also contained in the ByteArray#. But as soon as the GHC runtime reigns again, a garbage collector run can happen, the referenced values will move somewhere else, and the words that once were pointers in the ByteArray# become useless. But the corresponding entries in the Array# are updated by the garbage collector, as it knows that these are pointers, and not just words. This way, we get both a faithful copy of the closure on the heap and useful references to the contained data. Here is a demonstration of this effect:
$ ghci -XMagicHash -package ghc-heap-view
GHCi, version 7.4.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
[..]
Loading package ghc-7.4.1 ... linking ... done.
Loading package ghc-heap-view-0.1 ... linking ... done.
Prelude> let {a = [1,2,3,4]; b = 5:a}
Prelude> :m + GHC.HeapView
Prelude GHC.HeapView> rawHeapData <- getClosureRaw b
Prelude GHC.HeapView> rawHeapData
(0x000000004080d658,[1082185320,140040739366568,140040739365928],[0x00007f5dc68626a8,0x00007f5dc6862428])
Prelude GHC.HeapView> System.Mem.performGC
Prelude GHC.HeapView> rawHeapData
(0x000000004080d658,[1082185320,140040739366568,140040739365928],[0x00007f5dc41b3ad8,0x00007f5dc41b3b28])
The function rawHeapData is a thin wrapper around slurpClosure# which turns the primitive array in normal lists. Note that the second component of the triple is unchanged, but the third is updated by the garbage collector. Of course this means that the Show instance for the data type that ghc-heap-view uses to reference values is not referential transparent either.
The foreign function import above is of type “prim”, i.e. does not call a C function but rather a Cmm function. Cmm is a reduced C that GHC uses internally to compile the Haskell code to, and most primitive operations are implemented in this language – although I do quickly call regular C from my Cmm code to do the more complicated stuff, mainly figuring out what words of the closure are pointers.
The knowledgeable reader might notice that I am passing a boxed value of type Any to the foreign function. This is currently not possible with foreign prim functions, and to actually use that code, you need the patch in GHC ticket #5931. But you can use ghc-heap-view without that as well (and the Cabal package will by default use that path), using the following hack to obtain the pointer to a Haskell value on the Heap as an unboxed type that can pass to the primitive operation:
foreign import prim "slurpClosurezh" slurpClosure'# :: Word# -> (# Addr#, ByteArray#, Array# b #) data Ptr' a = Ptr' a aToWord# :: Any -> Word# aToWord# a = case Ptr' a of mb@(Ptr' _) -> case unsafeCoerce# mb :: Word of W# addr -> addr slurpClosure# :: Any -> (# Addr#, ByteArray#, Array# b #) slurpClosure# a = slurpClosure'# (aToWord# a)
This works because a Word and a Ptr' have the same closure layout, only differing in the fact that one stores an a, and the other stores a Word#.
Once we obtained the raw representation of the closure, we do the parsing in Haskell. Using the info table and the raw closure, we have enough information to tell which words have to be replaced by the appropriate pointer (which might already have been updated by the garbage collector) in the pointers list.
This work was supported by a scholarship from the Deutsche Telekom Stiftung.
by nomeata (mail@joachim-breitner.de) at March 13, 2012 09:52 AM
by Joachim Breitner (mail@joachim-breitner.de) at March 04, 2012 07:49 AM
More than two years ago I wrote arbtt, a tool that silently records what programs you are using and allows you to do statistics on that data later, based on rules that you define afterwards, hence the name automatic rule based time tracker. I wasn’t doing much with it recently (the last release has been half a year ago), but it nevertheless was running on my machine and by now has tracked a total time span of 248 days in 350000 records.
Yesterday, I had a use for it again: measuring the time spent creating a certain document with LaTeX. So I added a rule to my categorize.cfg and ran arbtt-stats. I knew that it was not very fast, and that my data set has grown considerably since I last used it. And indeed, it took more than 6 minutes to process the data and spit out the result.
Since I’m currently working on the GHC 7.4.1 transition in Debian anyways, I decided to check what happens if I compile the code with that version of the Haskell compiler, instead of the previous version 7.0.4. And behold: The whole process took merely 17.3 seconds to complete! At first I did not believe it, but the result was identical, both binaries were built with the same option, i.e. no profiling enabled or anything like that. Wouldn’t you also like to have such speed ups for free, just by waiting for someone else to improve their work?
I tried to find out the reason for the speed up and created profiling output from both the old and the new binary. The old binary spends 83% of the time in Categorize.checkRegex, which basically just call Text.Regex.PCRE.Light.match. Since the version of pcre-light is the same in both binaries, I conclude that the Foreign Function Interface that GHC provides to interact with C libraries (libpcre in this case) is much faster now, although I do not find any mention in the release notes. And even if I do not count the 83% time spent in checkRegex, the code from the new compiler is still 2.7 times faster. Thanks, GHC devs, great work!
by nomeata (mail@joachim-breitner.de) at February 26, 2012 04:25 PM
Assume the following situation: You set problem sheets for a class, and at the end of the semester, you have 13 individual LaTeX files, including the preamble (the stuff from \documentclass to \begin{document}). Now you want to provide a file that contains all problems, followed by all problems with solutions. You could just concatenate the resulting PDFs, but that would waste quite a bit of paper, as most problem sheets do not fill the whole page. You could just copy’n’paste the LaTeX code, but that is not neat either.
So you want to include the 13 individual documents in one LaTeX document. LaTeX provides \include and \input commands, but these would require that the content of the individual files, i.e. the stuff between \begin{document} and \end{document}, is put in a file of its own. You cannot just \input the full problem sheet document, as there must be only one \documentclass command, and a few preamble commands are invalid within the document.
There are various solutions suggested, e.g. on Wikibooks and StackOverflow, and there are packages helping with that functionality, such as combine or subfiles. I chose a rather rough, hands-on solution that worked great in my case: The idea is to simply re-define all commands that you have in your preamble to do nothing, including the document environment, before including the other files and constraint to a group. In my case, the preambles were relatively small, as the common definitions of the problem sheets were in a file of their own. I am also using the \foreach command provided by the TikZ package for a convenient loop over all problem sheets:
\foreach \x in {0,...,13} {
{
\excludecomment{solution}
\DeclareDocumentCommand{\documentclass}{om}{}
\DeclareDocumentCommand{\usepackage}{om}{}
\newcommand{\Blattnummer}[1]{}
\DeclareDocumentCommand{\FirstDueDate}{mmm}{}
\DeclareDocumentCommand{\ThisDueDate}{mmm}{}
\newcommand{\SkippedWeeks}[1]{}
\renewenvironment{document}{}{}
\section*{Problem Sheet \x}
\setcounter{problem}{0}
\input{GraphTheoryProblems\x}
}
}
\clearpage
\label{sols}
\foreach \x in {0,...,13} {
{
\newenvironment{solution}{
\par\addvspace{1em}
}{
}
\DeclareDocumentCommand{\documentclass}{om}{}
\DeclareDocumentCommand{\usepackage}{om}{}
\newcommand{\Blattnummer}[1]{}
\DeclareDocumentCommand{\FirstDueDate}{mmm}{}
\DeclareDocumentCommand{\ThisDueDate}{mmm}{}
\newcommand{\SkippedWeeks}[1]{}
\renewenvironment{document}{}{}
\section*{Solution Sheet \x}
\setcounter{problem}{0}
\input{GraphTheoryProblems\x}
}
}
To redefine commands taking optional arguments, using the xparse package is convenient. You can see how I include every file twice, first with the solution environment turned into a comment and then again with the solution environment actually showing its content. I create a section header based on the counter of the foreach loop, but I could have easily redefined \title and use that for a header, if my problem sheets had used that command (They don’t, as they calculate the problem sheet number and the due date based on the number in the filename – very convenient, and maybe worth a blog post of its own). The problem counter is reset for each file, so that the problems are numbered individually. I could have left this out, then the problems would be numbered consecutively across all included problem sheets.
The individual and resulting files can be found on the Graph Theory course website.
by nomeata (mail@joachim-breitner.de) at February 15, 2012 01:45 PM
by Joachim Breitner (mail@joachim-breitner.de) at January 29, 2012 09:44 AM





by Felix Brandt (nospam@example.com) at January 19, 2011 11:53 PM



by Felix Brandt (nospam@example.com) at January 11, 2011 04:09 AM



by Felix Brandt (nospam@example.com) at January 04, 2011 11:00 AM


by Felix Brandt (nospam@example.com) at January 03, 2011 11:00 AM
by Felix Brandt (nospam@example.com) at December 31, 2010 10:59 PM


by Pascal Maillard (nospam@example.com) at October 08, 2008 06:18 PM
Since Tuesday, my new home, at least for the next year but maybe even longer, is Paris! When I first got into our cosy apartment, it was a light shock for me, for I knew it was small but that small? 25 m² for two persons isn't that much, is it? But it is really cute and lovely and beautiful and situated in a very lively area of Paris, called Belleville, so I am very happy to be here. My sweetheart is currently in Brussels but will arrive on the 18th, then we will be able to test if we can live together in one room for a whole year... Luckily we have plenty of places to go out to, first of all the university of course but then bars, clubs, theaters, casinos, coffee shops (no, not here) etc, etc, etc...
By the way, I have to take a picture of the university campus of Jussieu (my university), it is really ugly! Huge buildings in 70s-style with a lot of metal, dirty windows and walls, high concrete pillows... Fortunately, the math department has moved because of the construction works on the campus, they actually found a quite appealing building not far from it. My courses there will start on the 22nd, but next Monday I will already attend a course in physics: Introduction to Quantum Mechanics. Hopefully, I'll be able to understand a bit ![]()
by Pascal Maillard (nospam@example.com) at September 12, 2008 05:42 PM
by Pascal Maillard (nospam@example.com) at August 31, 2008 09:53 AM
by Pascal Maillard (nospam@example.com) at August 24, 2008 04:41 PM
by Pascal Maillard (nospam@example.com) at August 11, 2008 08:27 PM