A simple approach to reference and PDF management

Sep 5, 2014

Managing references and citations is essential for academic publishing and reproducible research. However, managing large volume of references and citations over years is no easy task! A number of softwares have been developed specially for the task and been discussed many times. In this post I will briefly describe my own approach, which I have been using over 4-5 years now, to manage references, notes and corresponding files including PDFs. In particular, my approach aims to be:

  • Scalable - Manage hundreds of references and related files; It should be easy to add new reference/notes and should require no/little maintenance.
  • Easy formatting - References should be accurate and should be able to handle different character encoding, accents and should adapt to any reference-style based on the publisher specification.
  • Portable - Easy synchronization of whole setup (files, notes etc.) to a new computer/laptop/device.
  • Searchable - Search through titles, authors, previous notes etc.
  • Cross Platform - Same method should work across different operating systems (Linux, Windows, Mac) with no/minimal changes.
  • Work offline - Because Internet access is never as reliable as one wants it to be (also see 'Portable' above).
  • Free and Open - "Free as in beer" and, preferably, as "free speech" as well (no proprietary format). Being "open" also make it possible to port the data to some-other format/method, if required.

Note that the described method relies on BibTeX, mainly because it can achieve all the above goals and because BibTeX (& LaTeX) is de facto standard for writing in my discipline (Engineering). Use of BibTeX also makes bibliography dead simple in LaTeX - absolutely no export/import required! If BibTeX is not acceptable, skip to last section below for some alternatives.

Approach

The approach itself is fairly simple and is motivate by The Unix philosophy of small tools. The approach has three components:

  1. A directory on our computer/device, say with name "papers", which will contain all the files related to the references. Basically, this directory is the self-contained repository of all our information.
  2. JabRef for managing the BibTeX (.bib) file which manages all information, and
  3. A file synchronization tool, like Dropbox or Unison, to synchronize the papers directory to other devices, if required.

So, basically we save our files (PDFs etc.) in a directory, open/read/edit the files using our favorite software, add/edit the reference information via JabRef, and sync to other devices using our favorite sync-software. (In theory one could also use any other BibTeX manager, but JabRef is absolutely the best BibTeX manager I know.) For ease of file management, we can create 2-3 sub-directories to organize files, but it is completely optional as JabRef allows more advanced categorization. The image on right shows a sample directory structure. reference.bib is the BibTeX file which contains all the information related to references and lies at the heart of this approach. The sub-directories contain PDF and other files related to references. In my approach, I use one single .bib file for managing all reference, which in my experience works in a much better unified-way than multiple .bib files.

Next, I will outline some bare-minimum steps to get started with this approach. However, feel free to explore JabRef options and make your own customizations.

Setup

  1. Create a papers directory with 1-2 relevant sub-directories. Optionally, throw in few PDF files of your references in sub-directories. Avoid spaces in file and directory names; use underscore instead.
  2. Download and install latest version of JabRef for your platform.
  3. Open JabRef and create new database (File->New database). Save database as a reference.bib in papers directory.
  4. In JabRef, go to the menu Options->Preferences and set following:
    • In Files tab, check "Open last edited databases at startup" and "Backup old file when saving".
    • In External programs tab, check "Allow file links relative to each bib file's location" and "Use the bib file location as primary file directory"
  5. Optionally, setup your synchronization/backup for the papers directory.

Adding references

This step is the most important step, as we want our reference to be accurate. Let us assume that we have read this paper from Einstein and have saved a copy of the PDF file in appropriate sub-directory. Next, we would want to add it to our refernce list and add some notes from our reading. The idea here is that we will be very careful with all the details while adding the references the first time and then we can use it confidently life-long (almost) without any changes. If you are entirely new with JabRef then go through this short tutorial to familiarize yourself.

  • Open reference.bib using JabRef. Go to BibTeX->New entry and then select Article from entry type dialog.
  • Fill in the details of the paper starting from "Required fields" tab. You need not fill in all the fields in other tabs. You can find more about fields in BibTeX Help and wikibooks.
  • You can also copy-paste the details of field from the paper, publisher website, or listing on Google Scholar. But make sure that the details are correct. For example, the author details of the Einstein paper is wrong on Google Scholar!
  • While selecting Bibtexkey be sure that it is unique as you would never want to change it in future (Bibtexkey is the key which would refer from LaTeX). I typically follow the format: firstAuthorLastName_year_four_word_paper_summary for my Bibtexkey (& PDF filenames).
  • To link the PDF (or any other file), go to "General" tab and add the link to file. You can add as many files as you want to an entry. This allows easy access to the files from JabRef. See help section for more details and options.
  • You can also add your review of the paper in "Review" tab. I prefer adding some "Keywords" and write my notes in "Comment" in "General" tab. Also, I prefer to use my favorite 3rd party tools to annotate/highlight the PDF itself.
  • JabRef also supports groups (which are somewhat similar to labels in Gmail) and marking for more advanced categorization. However, I never felt any need of such categorization because the search in JabRef works well and use of keywords pretty much categorizes all the references.

While adding new entries, JabRef will autocomplete author names and words and can also autogenerate Bibtexkey. Once you have added enough references, you can directly link to the reference.bib from LaTeX file for bibiliography. You can also put the .bib file in some sort of version control for enhanced security and control. I prefer to create weekly backup of the entire papers directory to external hard-drive for peace of mind.

Cross-platform

JabRef is written in Java which makes it cross-platform by design. I frequently use it on desktops (Linux, Windows and Mac). The method also works well with portable android devices with RefMaster. Basically, RefMaster replaces JabRef on andriod and it supports Dropbox integration. There should be similar apps for managing BibTeX on iOS platform.

Drawbacks

The only feature which I miss in this approach is the ability to search inside the linked files when searching in JabRef. However, it is not a show stopper, especially in presence of various desktop search tools.

Also, this approach will not parse your computer for PDF files and auto-magically generate "correct" references. This method is geared towards accuracy and, in my experience, the auto-magical metadata extraction from PDF do not work well frequently and have rather large error rate. Careful verification of reference data while entering data is important and fruitful in long run.

Another drawback could be the steep learning curve required for getting comfortable with LaTeX (& BibTeX). It is assumed that people from my discipline (Engineering) are already using or will be using LaTeX. See next section for alternatives.

Alternatives

If LaTeX (& BibTeX) is not your cup of tea then following are some alternatives (but one should consider why should they not use LaTeX and who else uses LaTeX). I ended up with JabRef because most of the following did not had upto-the-mark BibTeX management, atleast by my taste, when I was trying them.

  • Mendeley is a good alternative and specially works well with office documents and can search inside linked PDFs.
  • citeulike is an online service and works well for collecting references while browsing over internet.
  • Zotero used to be a browser-plugin which offers good features, but I never tried it seriously. The standalone version came much later but looks good.

There are several other softwares which can be used for refernce management. Here is a popularity context of different softwares and here is an irony post comparing them. Everyone has their own preferences and a solution may not be the 'best' solution for everyone. The described approach works for me and could be helpful to others.