Skip to content

Automatically generate PDF, ePub, and MOBI versions of Scala Book #1563

@alvinj

Description

@alvinj

This is the start of some notes on how to automatically create PDF, ePub, and MOBI versions of Scala Book. I have already generated first versions of these documents with a manual process (excluding a couple of problems noted below), and this is a writeup of how to automate the process.

To create an ePub document

  • Copy all markdown files and LIST_OF_FILES_IN_ORDER from the website directory (_overviews/scala-book) to a working directory
  • Add a # title tag to each *.md file
    • get the title from the header section
    • prepend a chapter number to each title
  • Remove the header content from all *.md files
  • Transform all <pre> sections to use only four backticks
    • all “fenced code blocks” need to be transformed (scala, java, etc.) to use four backticks
    • actually, I know this is necessary for the PDF, but it might not be necessary for the ePub and MOBI versions
  • Generate a Pandoc command that includes all *.md files in the proper order; this command looks like this:
pandoc -o ScalaBook.epub \
    metadata.txt \
    working/introduction.md \
    working/prelude-taste-of-scala.md \
    working/preliminaries.md \
    50 more lines here ...

I have code to do all of that, I just need some free time to clean it up and automate it.

To create a MOBI document

There are more elaborate ways to do this, but at the moment this command seems to work, generating the MOBI document from the ePub:

kindlegen ScalaBook.epub

It looks like KindleGen is available for Linux, MacOS, and Windows:

To create a PDF document

I currently have a way to do this, but it will take a little time to automate. The first few steps in the process are similar to the ePub process, but you don’t need to add a chapter number:

  • Copy all markdown files and LIST_OF_FILES_IN_ORDER from the website directory (_overviews/scala-book) to a working directory
  • Add a # title tag to each *.md file
    • get the title from the header section
  • Remove the header content from all *.md files
  • Transform all <pre> sections to use only four backticks

After that the steps are:

  • Convert all of the Markdown files to LaTeX files
  • Generate the PDF using a latexmk command
    • Historically I have manually worked through any issues that come up at this time

Known problems

Generating the PDF

  • LaTeX doesn’t like the trick I used with the “Prelude” title, so that has to be replaced
  • The PDF-generating process gets stuck on a line somewhere near this:
'', k, v) val keys = m.keys val values = m.values val contains3 =

I haven’t had the time to look into that yet.

Generating the ePub document

  • This process fails on the tables in the following files, so this needs to be looked into:
    • built-in-types.md
    • collections-101.md
  • The {::comment} syntax shows up in the MOBI document, so it’s probably also in the ePub
    • I’ll submit a pull request to delete all comments from Scala Book

Tools

So far my “tools” for generating these documents are:

  • Unix shell scripts, including sed commands
  • Some custom Scala scripts
    • I wrote these to remove the Markdown header content, and add # title tags to the resulting Markdown files
  • Pandoc
    • This is used to generate the ePub and MOBI versions
  • LaTeX
    • This is used to generate the PDF
    • I use a Mac, and I think I installed the tools (several years ago) with MacTeX

What I need

Mostly all I need to complete this process is some free time on my part, and then I just need to know that the tools listed will be available on the server. Assuming I can work through the problems listed, the whole process is really:

  1. Copy the website Markdown files to a working directory
  2. Transform their header sections
  3. Convert the *.md files to *.tex files, and generate the PDF with the latexmk command
  4. Generate the ePub with the pandoc command
  5. Generate the MOBI with the kindlegen command

I think the ePub and MOBI files can also use a stylesheet, so that’s something else to be looked into, but I’m more concerned about automating these processes at the moment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions