Friday, April 10, 2015

Dropping pages from a pdf

When I got a journal article through Illiad, there was an annoying first page about copyright law. After futzing around with Preview for a while, I downloaded a command-line tool:

sudo port install pdftk

My foo.pdf was 10 pages, so to get pages 2-10 I just did

pdftk foo.pdf cat 2-10 output out.pdf

Bonus: I wrote a script that reads the output from the dump_data command. One of those lines is NumberOfPages, so I parsed that number and put that into the cat command so I wouldn't have to look it up myself.

Tuesday, March 31, 2015

Using git with Word

I like git, and I like to use it with most of my projects. I use LaTeX for documents, but sometimes I need to use Word. (Apparently ISME J does not accept submissions in TeX. It only accepts Word, rtf, and, unbelievably, plain txt.)

Word uses a binary format, so git diff normally displays nothing useful. Martin Fenner posted an incredibly useful blog post that solves this whole problem for me.

I'm on OSX. First, I got pandoc, a tool that turns Word files into simpler text formats that can be more easily piped into diff.

Second, I needed to add some lines to the .gitconfig file in my home directory:

[diff "pandoc"]
  textconv=pandoc --to=markdown
  prompt = false
  wdiff = diff --word-diff=color --unified=1

Third, in the git repo, I needed to add a new file .gitattributes with contents

*.docx diff=pandoc

Et voila! git diff works just like I would hope.