LaTeX to word document

Jan 24, 2014
Tested on: Linux Mint 13 (Ubuntu 12.04), Microsoft Office 2010, LibreOffice 3

It is common to typeset scientific documents in  \(\LaTeX\). However, few people may like to still read and edit the document in word processors like Microsoft word (.doc or .docx)or LibreOffice (.odt). Here are quick (& dirty) steps to produce reasonably looking "office" documents from .tex files. The idea is to exploit HTML output as intermediate format which is readable by word processors. This ideas is shown below and has been explored and reported widely.
.tex.html.docx
  1. Convert \(\LaTeX\) to .html: There are various ways to achieve this, but htlatex from TeX4ht worked best for me. First install TeX4ht and then then run following on the \(\LaTeX\) document. Note that htlatex runs latex three times before calling TeX4ht programs and it would write out all the outputs in the same directory as the .tex file (see below for handling PDF images). The generated files would include one .html file (paper.html for command shown below). $ sudo apt-get install tex4ht $ htlatex paper.tex
  2. PDF images: If the .tex file uses PDF files as images, then htlatex (or latex) may not be able to go along with correct conversion. Some discussions on Stack Exchange were very help for this issue and provided the solution. First create a configuration file, say myxhtml.cfg, with following content: \Preamble{xhtml} \Configure{graphics*} {pdf} {\Needs{"convert \csname Gin@base\endcsname.pdf \csname Gin@base\endcsname.png"}% \Picture[pict]{\csname Gin@base\endcsname.png}% \special{t4ht+@File: \csname Gin@base\endcsname.png} } \begin{document} \EndPreamble Then run htlatex as shown below. Note that the .tex file should use correct extension for every included image (or else use \DeclareGraphicsExtensions{.pdf}). $ htlatex paper.tex myxhtml
  3. Fix HTML: Sometimes TeX4ht will not write out clean html files i.e. the html tags may not be aligned properly. This may prevent html to import into word processors. Online editors like Fix My Html and HTML Tidy Online did the trick for me. Use the .html file generated by htlatex as input to these online editors and save the fixed html file for next step.
  4. Import HTML in word processor: Usually most word processors would support importing HTML files. If that does not work (like in my case), you can just open the HTML file in a browser then copy-paste the whole page into the word processor (check if your word processor also supports Paste Special for formatted text. It everything works out then you should be able to just save the pasted document in .doc, .docx or .odt format. 
The above set of steps produced good looking single column document for me with correct bibiliography references. The math equations are embedded as images in the documents, which may not be optimal but the process works for reviewing and editing the text by someone who prefers word processors.
Read more ...

Parent directories in tar archives

Jan 3, 2014
Tested on: Linux Mint 13 Maya (based on Ubuntu 12.04 LTS)

It is common to see full directory tree or . (dot) in tar archives. Sometimes, it may be unnecessary hassle to remove all parent directories after extracting or to look for the extracted file which merged with other files in extracted directory. In some situations, it may be desirable to create a tar archive with top directory (with informative name). There has been many discussion about the same. Here is quick summary with an example for future reference.

Test setup

We will create a test directory with some files in it with following commands (outputs are also shown): $ mkdir -p /tmp/path/to/test/dir_archive $ touch /tmp/path/to/test/dir_archive/file{1..10} $ ls /tmp/path/to/test/dir_archive/ file1 file10 file2 file3 file4 file5 file6 file7 file8 file9 $ tar --version tar (GNU tar) 1.26 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later . This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by John Gilmore and Jay Fenlason.

Full parent directory

Following command creates tar archive with full parent directory structure as shown by --list command. $ tar -caf /tmp/test.tar.gz /tmp/path/to/test/dir_archive $ tar --list -af /tmp/test.tar.gz tmp/path/to/test/dir_archive/ tmp/path/to/test/dir_archive/file3 tmp/path/to/test/dir_archive/file10 tmp/path/to/test/dir_archive/file1 tmp/path/to/test/dir_archive/file8 tmp/path/to/test/dir_archive/file4 tmp/path/to/test/dir_archive/file2 tmp/path/to/test/dir_archive/file7 tmp/path/to/test/dir_archive/file5 tmp/path/to/test/dir_archive/file6 tmp/path/to/test/dir_archive/file9

Dot (.) as parent directory

Following command creates tar archive with dot (.) as parent directory. Note that the tar command ends in a dot (.). $ tar --directory /tmp/path/to/test/dir_archive -caf /tmp/test.tar.gz . $ tar --list -af /tmp/test.tar.gz ./ ./file3 ./file10 ./file1 ./file8 ./file4 ./file2 ./file7 ./file5 ./file6 ./file9

Directory name as parent directory

The trick to get dir_archive as parent directory in tar archive is to use --directory to change directory to one level up as shown below. Note location of dir_archive in the command. $ tar --directory /tmp/path/to/test -caf /tmp/test.tar.gz dir_archive $ tar --list -af /tmp/test.tar.gz dir_archive/ dir_archive/file3 dir_archive/file10 dir_archive/file1 dir_archive/file8 dir_archive/file4 dir_archive/file2 dir_archive/file7 dir_archive/file5 dir_archive/file6 dir_archive/file9
Some more notes about tar command:
  • Above commands can be combined with used along with exclude directive to by using --exclude='dir_archive/dir_exclude' after the --directory option.
  • Do not mix absolute and relative path in tar command for pathname and exclude option. --directory is an exception and can be absolute path while others paths can be defined relative to it.
Read more ...