Back to Basics: Creating a Bootloader from Scratch

In this post I will walk throught the steps to write the most basic bootable program possible in x86 assembly language. Before we can get any code running we need to gather some tools and resources. Since we are going to write a program to boot a computer we need a convenient way to test it. We are not going to be using floppy disks (I don’t even have a floppy disk drive in my computer) and we are not going to be rebooting the computer we are working on. Instead we are going to use a virtual computer and file based images.

Bochs, the IBM PC emulator

We are going use the x86 instruction set and the IBM PC model. The first one determines the instructions we write and the tools we need to translate those instructions into machine code. The second one determines the conventions and rules we follow to get our program to execute, specifically the boot process by way of the BIOS.

Bochs is software that emulates an Intel x86 processor and IBM PC. It runs on many platforms including Windows and it comes with it’s own BIOS and VGA BIOS. When you run Bochs it will look in the current path for a file called bochsrc.txt for configuration. There is an example of this in the installation folder. It looks pretty complicated, but the good news is that we can get by with a very simple configuration file:

log: -
floppya: 1_44=boot.img, status=inserted
boot: floppy

The first line will log output to the screen, the second line sets the floppy disk to disk image called boot.img and the third line causes the BIOS to boot from the floppy. We can create this bochsrc.txt file in our working folder. If we start Bochs from this folder it will use our configuration.

If we start Bochs now we get two errors. The first occurs inside the simulated PC when it tries to read from the floppy disk, but since we specified a non-existent file it can’t. Because our configuration specifies only one boot medium, after trying the floppy the machine gives up:

Boot failed: could not read the boot disk

FATAL: No bootable device.

At this moment Bochs will log a panic event and gives the option to kill the simulation. So we need to provide it with a floppy drive image. Any file called boot.img will do, even an empty file. This results in the following error:

Boot failed: not a bootable disk.

We are one step closer, but we need to make our disk image resemble a bootable floppy disk. How can we do that?

The boot sequence

On the IBM PC the BIOS firmware (fixed software that is embedded in the device) is always loaded and executed first. The BIOS can be instructed to inspect several devices to look for a bootable program. You do this on your PC in the BIOS configuration screen. In Bochs you specify the boot sequence in the configuration file: boot: floppy, cdrom, disk. In our case the floppy drive is inspected but it doesn’t pass the test for a bootable disk. So what makes a disk bootable?

There are only three rules for the code on floppy drive image to be considered bootable: the code must be at the beginning of the device, exactly 512 bytes in size (the first sector) and the last two bytes must have the values 55 and aa respectively.

We can easily create such a file if we have a hex editor, an editor for binary files. There are many (free) hex editors you can download. Of course we want our program to do something, so we make the first byte of our program an instruction for the processor. For this example I decided to use the HLT instruction. We have to write this instruction in the form of an opcode, a direct machine executable instruction. According to the Intel reference manual the opcode (in hex) for HLT is f4. The file contents should now look like this in your hex editor (abbreviated, the real length is 512 bytes):

0000: f4 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
0010: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
*
01e0: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
01f0: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 55 aa

When we save this file as boot.img and start Bochs it will accept our floppy disk as a bootable device and run our code without errors.

Going up a level, programming in assembly

Of course, nobody programs directly in opcodes or creates binary files by hand. I justed wanted to show how it’s done to illustrate the basic process. We would be better off using an assembler, which translates higher level instructions to opcodes and has a number of built-in macros and instructions to streamline the process of writing code.

The Netwide Assembler (NASM) is a good choice, it’s free, runs on Windows (and many other platforms) and is pretty popular. With NASM and similar software we can write our code in assembly, and create executables from the code. Our bootable program in assembly looks like this:

; Halt the processor
hlt

That’s it, like our machine code program earlier this program has just one instruction. But NASM will not create a bootable program from our code. We still need to play by the rules of the BIOS for bootable programs, so how do we get NASM to create a file that works?

First of all, we can use pseudo instructions in NASM to initialize arbitrary data. The db (data byte? define byte) pseudo instruction initializes one or more bytes to a given value. So we could fill the file with a lot of db 0 statements to get the length of 512 bytes. But that’s not very convenient. It’s better to use the times n prefix in NASM which causes the instruction that follows it to be repeated n times. Combined with the dw (data word? define word) pseudo instruction to write the 0xaa55 value at the end of the file, our program would look like this:

hlt
times 509 db 0
dw 0xaa55

This works (set the floppy disk image in Bochs to the output file generated by NASM), but notice that we hardcoded the number of nulls we need to pad our file to 512 bytes. What if we add more instructions to the beginning of the program? We would have to change this number in accordance. It would be better if we could somehow instruct NASM to always repeat until the file size is 512 bytes.

Well, we can do that in NASM using the special token $ in the expression following times. $ evaluates to the position at the beginning of the line on which it occurs. If we label the first instruction, we can use that label’s offset to calculate the amount of padding needed:

start: hlt
times 510-($-start) db 0
dw 0xaa55

Now we can add code after the first instruction and the file size will always be 512 bytes.

Conclusion

That’s it for this little excursion, we have booted a simulated IBM PC using a program written in assembly. That it doesn’t do anything useful is not the point, it’s about the process of writing such low-level code and figuring out just how the machine operates when it boots. I find it interesting and quite challenging to think in terms of machine operations. It shows just how much I take for granted when programming in a high-level language like C#.

Of course, this program is not yet complete. First of all, it’s not really a bootloader, it boots, but it doesn’t load anything. At the least, it should load a program that writes Hello World to the screen. I will save that for another post. I hope you found it useful, thanks for reading!