Understanding the Linux "fortune" Utility and How to Read Its .dat Files in Python
$ fortune
Those who have had no share in the good fortunes of the mighty
Often have a share in their misfortunes.
-- Bertolt Brecht, "The Caucasian Chalk Circle"
The Linux fortune
utility is a classic command-line tool that displays random humorous, insightful, or philosophical messages when executed. It has been a staple of Unix and Linux systems for decades, offering users a touch of humor, wisdom, or inspiration with each command.
In this article, we'll explore how the fortune
utility works, how its data is stored, and how you can use Python to read and extract random fortunes from the .dat
files that fortune
relies on.
What is the fortune
Command?
Source : Distrotech/fortune-mod | manual: fortune
The fortune
command prints a random quotation or aphorism from a predefined database. It is often used for amusement to start a terminal session or display in login shell.
$ fortune
Those who have had no share in the good fortunes of the mighty
Often have a share in their misfortunes.
-- Bertolt Brecht, "The Caucasian Chalk Circle"
And you will receive a randomly selected fortune from the available databases.
How Does fortune
Work?
Source code: Distrotech/fortune-mod
The fortune
utility selects random text snippets ("fortunes") from a database file. The fortunes are stored in two files per category:
A plain text file (
.dat
) containing multiple fortune messages, separated by a delimiter (usually%
).A binary index file (
.dat
file) that holds metadata, including offsets to the different fortune strings within the text file. This allowsfortune
to quickly retrieve and print a random fortune instead of parsing the entire file.
One can view datafiles used by fortune in `
/usr/share
` or `/usr/share/games
` folder in linux:[advaeta@vbox tmp]$ ls /usr/share/games/fortune/ art drugs humorists linux.u8 osfortune.dat pratchett.dat tao art.dat drugs.dat humorists.dat literature paradoxum pratchett.u8 tao.dat art.u8 drugs.u8 humorists.u8 literature.dat paradoxum.dat riddles tao.u8 ascii-art education humorix-misc literature.u8 paradoxum.u8 riddles.dat translate-me ascii-art.dat education.dat humorix-misc.dat love people riddles.u8 translate-me.dat ascii-art.u8 ....
Did you notice anything? There are two files with same prefixes eg. “art” and “art.dat” file.
In the sourcode you don’t see *.dat files. So, where do these .dat files or INDEX FILES are come from? (shushhh strfile)
Understanding strfile
and .dat
Files
To optimize lookup speed, fortune
does not read the text files directly each time it is executed. Instead, it relies on an associated binary index file (with a .dat
extension). This index file is generated using the strfile
utility.
The strfile
Utility
man: strfile
strfile
is the command-line tool used to create the .dat
index files required by fortune
. It scans the text file, identifies the positions of each individual fortune (demarcated by a delimiter, usually %
), and generates an index file with metadata and offsets.
Using strfile
to Generate a .dat
File
To create a .dat
file from a text file of fortunes:
strfile fortunes fortunes.dat
The
fortunes
file contains the actual text entries, each separated by a%
symbol.Running
strfile
onfortunes
generatesfortunes.dat
, which contains the metadata and offsets.
Once you have both fortunes
and fortunes.dat
, you can use the fortune
command like this:
$ fortune fortunes
This will randomly pick and print a fortune from your custom file.
The generated *.dat file can be customized further storing offsets either randomly, or ordered alphabetically, and an option to ignore case as well. Read the manual for more details.
Now, lets understand the structure of a *.dat file.
Structure of a fortunes.dat
File
The .dat
file is a binary file that contains a header followed by a list of offsets. Each offset marks the start of a fortune in the corresponding .fortune
text file.
Header Structure (21 bytes total)
The format of the header is:
#define VERSION 1
unsigned long str_version; /* version number */
unsigned long str_numstr; /* # of strings in the file */
unsigned long str_longlen; /* length of longest string */
unsigned long str_shortlen; /* shortest string length */
#define STR_RANDOM 0x1 /* randomized pointers */
#define STR_ORDERED 0x2 /* ordered pointers */
#define STR_ROTATED 0x4 /* rot-13'd text */
unsigned long str_flags; /* bit field for flags */
char str_delim; /* delimiting character */
# Important
3 : byte for padding.
Breakdown of str_flags
:
Can you guess the header size? 21 or ??
Header size is 24. ( 5 * long + 1 * char + 3 padding)
All fields are written in big-endian byte order.
Reading the fortune
.dat
File in Python
To extract fortunes from a .dat
file in Python, we need to read both the header information and the list of offsets, then use those offsets to locate and print random fortunes from the corresponding .fortune
text file.
How to print header info?
full source code : fortuneHeadersReader.py
#######################
# This program is intentionally made ugly to make it easy to understand
########################
# printing headers: Integer
headers = ["version", "numstr", "longlen", "shortlen", "flags"]
for header in headers:
chunk = datfile.read(4)
header_int = struct.unpack('>I', chunk)[0] # reading unsigned Integers
if header == "flags":
print(f"{header:<8} (str_random): {(header_int & 0x00000001)}")
print(f"{header:<7} (str_ordered): {(header_int & 0x00000002)}")
print(f"{header:<7} (str_rotated): {(header_int & 0x00000004)}")
continue
print(f"{header:<20} : {header_int:<10}")
# delimiter: 1 byte
delim_byte = struct.unpack('>B', datfile.read(1))[0] # reading Byte
print(f"{'Delimiter':<20} : {chr(delim_byte)}")
Python Script to Read the .dat
File and Print a Random Fortune
Source: fortuneReader.py
def read_fortune_file(dat_file, fortune_file):
with open(dat_file, 'rb') as df, open(fortune_file, 'r') as ff:
# Read header (5 unsigned integers + 1 byte delimiter)
header_format = ">5I" # Little-endian: 5 unsigned ints (20 bytes total)
header_size = struct.calcsize(header_format)
header_data = df.read(header_size)
if len(header_data) != header_size:
raise ValueError("Invalid .dat file: header size mismatch.")
version, num_str, longlen, shortlen, flags = struct.unpack(header_format, header_data)
# Read the delimiter character
delim_byte = df.read(1)
delim_char = chr(struct.unpack('>B', delim_byte)[0])
print("Header Information:")
print(f"Version: {version}")
print(f"Number of Fortunes: {num_str}")
print(f"Longest Fortune: {longlen} characters")
print(f"Shortest Fortune: {shortlen} characters")
print(f"Flags: {flags}")
print(f"Delimiter: {delim_char} (ASCII: {ord(delim_char)})")
# Remeber: 3 bytes for padding : len(header) == 24
# Pick a random fortune
random_index = random.randint(0, num_str-1)
next_offset = 24 + (random_index * 4)
df.seek(next_offset)
random_offset = struct.unpack(">I", df.read(4))[0] # Read as unsigned int
print("\n---------")
print("-- Random Quote@", random_offset)
ff.seek(random_offset)
for c in iter( partial(ff.read,1), '%'):
print(c, end='')
print(ff.readline())
Comment below if you don’t understand the code. Happy to help (:
Takeaways from the Fortune Utility
1️⃣ Scalable Design and The Concept of .dat
Files for Efficient Parsing
.dat
files store structured offset information along with data.This enables the program to quickly jump to any fortune without scanning the entire file.
The format typically includes:
A header (metadata: version, count, longest/shortest string, etc.)
A delimiter (marks the end of each fortune)
A list of offsets (pointers to where each fortune starts)
The actual fortune texts
Instead of searching for the nth fortune line-by-line, we can:
Read a random offset from the offset table(*.dat files).
Seek directly to that byte position in the data file(fortune file).
Read until we hit the delimiter.
This is much faster than iterating over every line in a large text file!
Final Thoughts
The fortune
utility uses a .dat file with precomputed offsets for fast random access, making it efficient for large datasets. Understanding the strfile
format helps us design smarter and faster programs by enabling direct access to stored text without loading entire files into memory. 🚀