Git and word documents

If you got word documents in the git repo and tired of seeing binary diff message from git, do the following setups to fix it.

$ git diff
diff --git a/hello.docx b/hello.docx
index bc73959bc592..122761966158 100644
Binary files a/hello.docx and b/hello.docx differ

To see diff of what has been changed in the word document, do following:

  • Install docx2txt. You can download it from http://docx2txt.sourceforge.net. Follow the instructions in the INSTALL file to put it somewhere your shell can find it.
  • Create or append file ~/.config/git/attributes to have

*.docx diff=word

  • Edit .gitconfig to have following commands

[diff "word"]
textconv = ~/git_docx2txt.sh

[alias]
wdiff = diff --word-diff=color --unified=1

  • Create ~/git_docx2txt.sh with following text

!/bin/bash

docx2txt.pl "$1" -

  • chmod a+x ~/git_docx2txt.sh
  • You will have

$ git diff
diff --git a/hello.docx b/hello.docx
index bc73959bc592..122761966158 100644
--- a/hello.docx
+++ b/hello.docx
@@ -1 +1 @@
-Hello world
+Hello world: View word diff in git.

$ git wdiff hello.docx
diff --git a/hello.docx b/hello.docx
index bc73959bc592..122761966158 100644
--- a/hello.docx
+++ b/hello.docx
@@ -1 +1 @@
Hello worldworld: View word diff in git.

Happy diff view.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s