Converting Books Into Visual Novels Part 0.5: Creating book.txt
Feb 14, 2026
This is the second in a series of posts explaining my processes for how I convert books into visual novels. See the full series links over on the website's README.md.
The overall process I've settled on has four main steps: the first edit, the second edit, the image generation, and the third edit. This post will walk through… none of those, since before we can even start generating a pulp.txt file, we need a good book.txt reference file.
From .epub to book.txt
book.txt is the contents of the to-be-converted source novel in plain text form (with basic markdown annotations). As mentioned in Part 0, this file is what the Pulpifier will use as a reference when building the html from the pulp.txt file, as a check to make sure no accidental text-changing changes were made in the editing process. This is book.txt's main long-term role, but in the short-term, it's also what we use in the first edit pass to create the starter pulp.txt file.
I use Standard Ebooks .epub files as the basis for all book texts on this site. This isn't a strict requirement — source texts could come from anywhere — but it's convenient to use a single source, where I know the markdown formatting will be consistent: *single asterisks* for italics, **doubled asterisks** for uppercased, ## some number of pound signs for different header levels, etc.
Standard Ebooks doesn't provide .txt files directly, but their .epubs are easily convertible with the command pandoc book.epub -t gfm-raw_html --strip-comments --wrap=none -o book.txt. The --wrap=none flag is for making paragraphs take up whole single lines instead of being split up past some character limit. The --strip-comments I'm not sure is strictly necessary, but I set it just in case it applies to the highlights I make while reading. And then the main flag is the -t gfm-raw_html, which gives us the format with the nice markdown.
Well, mostly nice markdown. There is some manual cleanup work that needs to be done, since the Pulpifier can't automatically convert all types of markdown symbols to html, with some of them requiring case-by-case decisions to be made on how best to translate good book formatting into good visual novel formatting. Here are some of the cleanup steps:
book.txt Cleanup
First, as great as Standard Ebooks is as an organization, their giant Imprint headers and Colophon/Uncopyright footers are always the first thing that needs to go. I do credit them for each book with a link to the source .epub, but the right place for that link is not in the book.txt file itself. In its place, I add a simple one-line header with the book title and author name, such that the beginning of all book.txt files ends up looking something like this:
*The Sun Also Rises* by Ernest Hemingway This book is for **Hadley** and for **John Hadley Nicanor** “You are all a lost generation.” —Gertrude Stein in conversation “One generation passeth away, and another generation cometh; but the earth abideth forever … The sun also ariseth, and the sun goeth down, and hasteth to the place where he arose… The wind goeth toward the south, and turneth about unto the north; it whirleth about continually, and the wind returneth again according to his circuits … All the rivers run into the sea; yet the sea is not full; unto the place from whence the rivers come, thither they return again.” —Ecclesiastes ## Book I ### I Robert Cohn was once middleweight boxing champion of Princeton…
That'll show you, Standard Ebooks, for trying to tell me not to do evil in your Uncopyright.
That's what the final cleaned-up product looks like. However, there are several markdown shenanigans that can crop up in the pandoc conversion that need special handling. For instance, sometimes chapters generate with custom html annotations like:
<hgroup> ## II The Milkman Sets Out on His Travels </hgroup>
These need to be collapsed into single lines like:
## II The Milkman Sets Out on His Travels
There are also a variety of different formatting situations that put angle brackets at the start of lines, sometimes with additional html, e.g.:
> Dear Jake, > > We got here Friday, Brett passed out on the train, so brought her here for 3 days rest with old friends of ours. We go to Montoya Hotel Pamplona Tuesday, arriving at I don’t know what hour. Will you send a note by the bus to tell us what to do to rejoin you all on Wednesday. All our love and sorry to be late, but Brett was really done in and will be quite all right by Tues. and is practically so now. I know her so well and try to look after her but it’s not so easy. Love to all the chaps, > > <footer role="presentation"> > > Michael. > > </footer>
Sometimes the brackets represent text with blockquote margins, and sometimes they represent centered text, and sometimes they represent text meant to float right.
Since the bracket-prefixed lines aren't consistent, I always just remove them, and then later on, as part of editing the pulp.txt file, add in spans with helper classes like 'center' or 'fright', which the Pulpifier is smart enough to strip out when it comes to the sameness comparison. There are also some custom modifier tags that can be added to metadata that affect presentation. This just comes down to choosing the css that most closely matches the original .epub book look from within the constraints of the visual novel format.
Additionally, sometimes the text, bracket-prefixed or otherwise, is generated without line breaks between lines like:
> “One thing’s sure and nothing’s surer > The rich get richer and the poor get—children. > In the meantime, > In between time—”
Since the Pulpifier expects a strict text line -> blank line -> repeat 2-cycle sequence for book.txt's formatting, these need to either have line breaks added, if separate VN-slides are okay, or else merged onto a single line, but with custom <br> tags added in pulp.txt. For this case, since four lines can fit comfortably on one slide, we'd want to collapse it all down into a single line with breaks like this:
“One thing’s sure and nothing’s surer<br>The rich get richer and the poor get—children.<br>In the meantime,<br>In between time—”
Some other miscellaneous minor fixups that are needed:
Sometimes backslashes appear before certain characters, like dollar signs and square brackets, in the generated text file. These aren't needed, and can all just be deleted.
Sometimes the .epub-to-.txt conversion represents section breaks as super long repeated-dash lines like ------------------------------------------------------------------------. These I just consolidate down to a single em dash, which the Pulpifier knows to display as a long dash for that line.
And then some books have other weird niche markdown formattings that need to be fixed up on case-by-case basis. For example, one part of The Great Gatsby's book.txt generated with a markdown table, which has no meaning to the Pulpifier. Therefore, all the markdown needed to be stripped, just leaving the text, and then again, spans with appropriate css classes picked within pulp.txt so as to as closely match the presentation and intention of the original .epub's presentation. Ultimately going from this:
| | | | |------------------------------------------------|-----------|------| | Rise from bed | 6:00 | a.m. | | Dumbbell exercise and wall-scaling | 6:15–6:30 | ” | | Study electricity, etc. | 7:15–8:15 | ” | | Work | 8:30–4:30 | p.m. | | Baseball and sports | 4:30–5:00 | ” | | Practise elocution, poise and how to attain it | 5:00–6:00 | ” | | Study needed inventions | 7:00–9:00 | ” | > General Resolves > > - No wasting time at Shafters or \[a name, indecipherable\] > > - No more smokeing or chewing. > > - Bath every other day > > - Read one improving book or magazine per week > > - Save \$5.00 \[crossed out\] \$3.00 per week > > - Be better to parentsTo this:
Rise from bed 6:00 a.m.<br>Dumbbell exercise and wall-scaling 6:15–6:30 ”<br>Study electricity, etc. 7:15–8:15 ”<br>Work 8:30–4:30 p.m.<br>Baseball and sports 4:30–5:00 ”<br>Practise elocution, poise and how to attain it 5:00–6:00 ”<br>Study needed inventions 7:00–9:00 ” General ResolvesNo wasting time at Shafters or [a name, indecipherable]<br>No more smokeing or chewing.<br>Bath every other day<br>Read one improving book or magazine per week<br>Save $5.00 [crossed out] $3.00 per week<br>Be better to parents
God it's stupid having br-tags in the book.txt. It really would be so much cleaner to make custom html formatting entirely take place within the pulp.txt files, but that would necessarily increase parsing complexity in some way, since you'd be breaking the main 2-cycle assumption of paragraph line -> blank line -> repeat, which would be a long-term maintenance annoyance. But this current setup is also annoying. Ah well.
…With the necessary html div and span classes added in matching pulp.txt file, which happens later on, but ultimately ends up looking like:
Rise from bed <span class='fright'>6:00 a.m.</span><br>Dumbbell exercise and wall-scaling <span class='fright'>6:15–6:30 ”</span><br>Study electricity, etc. <span class='fright'>7:15–8:15 ”</span><br>Work <span class='fright'>8:30–4:30 p.m.</span><br>Baseball and sports <span class='fright'>4:30–5:00 ”</span><br>Practise elocution, poise and how to attain it <span class='fright'>5:00–6:00 ”</span><br>Study needed inventions <span class='fright'>7:00–9:00 ”</span> m=margin <div class='upper center'>General Resolves</div>No wasting time at Shafters or [a name, indecipherable]<br>No more smokeing or chewing.<br>Bath every other day<br>Read one improving book or magazine per week<br>Save $5.00 [crossed out] $3.00 per week<br>Be better to parents
And that all may seem like a lot of cleanup (with rather less-readable results), but ultimately these are the exception cases, and the overall consistent and well-vetted formattings of the Standard Ebooks .epub files makes this bit of overhead work still the least time-consuming option. Especially once you factor in that, for some of these cases, there's no way to avoid some necessary manual cleanup work, so as to make the formatting fit well in visual novel form.
Next Steps
That accomplished then, the book.txt is then ready for processing — ready for turning into an actual visual novel. In the next part, I'll walk through the first real step of that process — the first edit — which involves creating a pulp.txt file out of book.txt with name metadata, speaker metadata, expression metadata, and VN-friendly line breaks.