Mon 27 May 2024

I left graduate school decades ago, but I still love to read academic papers. The field of computer science reinvents its wheels constantly, but academic literature is a great way to mine existing ideas and avoid that problem. It's a way to "stand on the shoulders of giants."

For a long time, I maintained and carefully indexed a collection of actual printed papers. Once it reached the hundreds, that approach became too cumbersome. I ended up throwing away papers in order to avoid having too many, but often regretted doing that when some half-remembered idea popped up again in another context.

Now I have a crude system that meets my needs. I keep notes on the most interesting papers using Org-mode files, and I keep my collection in a Git repo in purely digital form. Every paper appears in the top-level directory, and there's a subdirectory to-read/ for papers I haven't yet read. A little bit of automation helps, too. Now managing almost two thousand papers is no problem.

Here's the Scheme program, papers, I use for adding new papers:

#!/usr/local/bin/chibi-scheme -r
;;;; -*- mode: scheme -*-

;;;; Expect environment variable <p> to name a directory that holds
;;;; all papers, e.g. "papers/".  A subdirectory "to-read/" that holds
;;;; unread papers must also exist.

;;;; Copy specified documents into "$p/" directory.

;;;; If <--to-read> is specified, copy them to "$p/to-read/", too.

;;;; If <--commit> is specified, use Git to commit each of the new
;;;; documents.  Each document will be committed separately.  It is an
;;;; error if any other file is already staged.

(import (chibi filesystem)
        (chibi pathname)
        (only (chibi process)
        (scheme process-context)
        (scheme small)
        (scheme write)
        (only (srfi 1)
        (srfi 98)
        (only (srfi 130)
        (srfi 166 base))

(define (echo . rest)
  (apply show #true rest)
  (show #true nl))

(define (echo-command command)
  (apply echo (list "command: " (string-join command " "))))

(define (usage program)
  (echo "Usage: "
        (path-strip-directory program)
        " [--commit] [--to-read] <pathname> ..."
        "Environment variable p must be set to a directory for holding papers."
        "It must have a subdirectory called \"to-read/\"."

(define (run command)
  (let* ((out+err+status (process->output+error+status command))
         (stdout (car out+err+status))
         (stderr (cadr out+err+status))
         (status (caddr out+err+status)))
    (unless (zero? status)
      (echo-command command)
      (write-string stderr)
      (write-string stdout)
      (exit 1))))

(define (run/assert command message)
  (unless (zero? (caddr (process->output+error+status command)))
    (echo-command command)
    (echo message)
    (exit 1)))

(define (act document papers commit? to-read?)
  (let ((filename (path-strip-directory document)))
    (link-file document (path-resolve filename papers))
    (when commit? (run `("git" "add" ,filename)))
    (when to-read?
      (let ((place (path-resolve filename (path-resolve "to-read" papers))))
        (symbolic-link-file (path-resolve filename "..") place)
        (when commit? (run `("git"  "add" ,place)))))
    (when commit?
      (run `("git" "commit" "-m" ,filename)))))

(define (switch? string) (string-prefix? "--" string))

(define (main arguments)
  (let* ((program (car arguments))
         (options (cdr arguments))
         (papers (or (get-environment-variable "p")
                     (begin (usage program) (exit 1))))
         (valid-switches '("--commit" "--help" "--to-read")))
    (cond ((member "--help" options) (usage program) (exit 0))
          ((any (lambda (a)
                  (and (switch? a)
                       (not (member a valid-switches))))
           (usage program)
           (exit 1)))
    (let* ((commit? (member "--commit" options))
           (to-read? (member "--to-read" options))
           (cwd (current-directory))
           (documents (map (lambda (f) (path-resolve f cwd))
                           (remove switch? options))))

      (when (null? documents)
        (usage program)
        (exit 1))
      (change-directory papers)
      (when commit?
        (run/assert `("git" "diff" "--cached" "--quiet")
                    "Error: Files already staged."))
      (for-each (lambda (f) (act f papers commit? to-read?))

For example, to add one paper, including a copy in to-read/, commiting it to the repo:

papers --commit --to-read /tmp/aim-349.pdf

I use MIT Scheme for most of my Scheme hacking, but Alex Shinn's Chibi Scheme is wonderful for implementing this kind of tool. It's small, R7RS-Small-compliant, and has many useful libraries. Thank you, Alex!

Fixed on Wed 29 May 2024 to handle relative pathnames.