My Operating system Development Experience and understanding -Part 04

7 min readAug 10, 2021

In this article we are going to talk about Segmentation in OS development and from this article I’m going to change my coding method because last article I have implement all my c codes in kmain.c file. sometime It’s confusing because of lot of code so I decide to implement C codes by name of our heading and call that to our kmain.c file

this is the recreated files for previous article :https://drive.google.com/file/d/1ZCMLZ37u9ZgwLFxjGUHjDk-7eVfIOnlk/view?usp=sharing

I have deleted my previous article code link you can get here

and also I have gave you some reference articles to recreate our codes in OS development

Warning: still we haven’t implement kmain function so don’t try to run your files it will make some error in your OS

Segmentation

The segmentation in x86 means accessing memory by segmentation. Section is part of the address space, may overlap, based on base sites and restrictions. To address the bytes in the segmentation memory, use 48-bit logical addresses: 16 bits for specified segments, while 32 bits are used to specify the offset within the desired segment. Add the offset to the base address, and then check the resulting linear address according to the limit of the segment. If everything is normal (including a temporarily ignored access check), the result is a linear address. When disabled, the linear address space maps 1: 1 to the physical address space and can access physical memory.

To enable segmentation, you need to set a table-segment descriptor table describing each segment. In X86, there are two types of descriptor tables: Global Description Table (GDT) and Local Descriptive Table (LDT). LDT is set and managed by the user space process, and all processes have their own LDT. If you need more complex segmentation models, you can use LDT-we won’t use it. GDT is shared by everyone — it is global.

1.Access memory

In real mode you use a logical address in the form A:B to address memory. This is translated into a physical address using the equation:

Physical address = (A * 0x10) + B

The registers in pure real-mode are limited to 16 bits for addressing. 16 bits can represent any integer between 0 and 64k. This means that if we set A to be a fixed value and allow B to change we can address a 64k area of memory. This 64k area is called a segment.

A = A 64k segment B = Offset within the segment

The base address of a segment is the (A * 0x10) portion of the equation I showed. It should be obvious that segments can overlap.

Eg, the segment 0x1000 has a base address of 0x10000. This segment occupies the physical address range 0x10000 -> 0x1FFFF, However the segment 0x1010 has a base address of 0x10100. This segment occupies the physical address range 0x10100 -> 0x200FF

As you can see we could use either segment to reach physical addresses between 0x10100 and 0x1FFFF since the segments overlap.

The x86 line of computers have 6 segment registers (CS, DS, ES, FS, GS, SS). They are totally independent of one another.

CS-Code Segment
DS-Data Segment
SS-Stack Segment
ES-Extra Segment
FS&GS-General Purpose Segments

DS, ES, FS, GS, SS are used to form addresses when you want to read/write to memory. They don’t always have to be explicitly encoded, because some processor operations assume that certain segment registers will be used.

E.g.

MOV [SI], AX will write the word contained in ax to the address DS:SI

MOV ES:[DI], AX will write the word contained in ax to the address es:di

CMPSB will compare the byte at DS:SI to the byte at ES:DI, set the zero flag if they are equal and decrement/increment SI and DI according to the state of the direction flag.

As you can see, often the segment register being used is not contained in the instruction, but there is one being used. EVERY time you form an address on an x86 processor there will be a segment register involved.

In most cases, there is no need to explicitly specify the paragraph to be used when accessing the memory. The processor has six 16-bit registers: CS, SS, DS, ES, GS, and FS. Register CS is a code segment register and specifies the segment to be used when the instruction is acquired. The register SS is used each time the stack is accessed (by stack pointer ESP), while the DS is used for other data access. The operating system can freely use registers ES, GS, and FS. The following example shows implicit use of segment registers

func:
mov eax, [esp+4]
mov ebx, [eax]
add ebx, 8
mov [eax], ebx
ret

The above example can be compared to the following example, the following example explicitly uses the segment register:

func:
mov eax, [ss:esp+4]
mov ebx, [ds:eax]
add ebx, 8
mov [ds:eax], ebx
ret

You do not need to use SS to store the stack segment selector, and use DS to store the data segment selector. You can store the stack segment selector in the DS and vice versa. However, in order to use the implicit style displayed above, the segment selector must be stored in their indent registers.

2.GDT

GDT / LDT is an array of 8-byte segment descriptors.
The first descriptor in the GDT is always an empty descriptor and can never be used to access memory.

GDT requires at least two segment descriptors (plus empty descriptors) because the descriptor includes not only the basic fields and restrictions, but also more information.

Two fields to our most relevant are “Type” fields and “Descript Right Level” (DPL) fields.

The table is displayed, the “Type” field cannot be written at the same time. Therefore, two segments are required:
A segment is used to perform code to be placed in CS (type is execute-only or execute-read),
A segment is used to read and write data (type is read / write) to place another segment register.
DPL specifies the privilege level required to use this segment.
X86 allows four privilege levels (PLs, from 0 to 3, where PL0 is the highest privilege. Most cases operating systems (such as Linux and Windows), using only PL0 and PL3.

However, some operating systems (such as Minix) use all levels. The kernel should be able to perform any operations, so it uses DPL to set to 0 (also known as kernel mode). The current privilege level (CPL) is determined by the segment selector in the CS. The following table describes the desired segments.

Please note that these segments overlap — they all contain the entire linear address space. In our minimum setting, we will use only subdivision to get the permission level.

3. Load GDT

Use the LGDT assembly instruction. Ask the address of the beginning and size of the specified GDT. As shown in the following example, the easiest way is to encode this information using “Compressed Structure”:

struct gdt {
unsigned int address;
unsigned short size;
} __attribute__((packed));

If the content of the EAX register is the address of the structure, the GDT can be loaded with the assembly code as shown below:

lgdt [eax]

If the instruction can be obtained from C, it may be the same as the input and output assembly code instructions, which may be easier. After loading GDT, you need to use the corresponding segment selector to load the segment register. The following image and table describe the contents of the segment selector:

The offset of the segment selector is added to the beginning of GDT, to obtain an address of the segment descriptor: The first descriptor is 0x08, and the second descriptor is 0x10 because each descriptor is 8 bytes. Since the core of the operating system should be executed in a privileged level 0, the requested privilege level (RPL) should be 0. For data registers, the load segment selector register is easy — simply copy the correct offset to the register:

mov ds, 0x10
mov ss, 0x10
mov es, 0x10
.
.
.

on gtd.s file

To load CS, we must be a “far jump”:

; code here uses the previous cs
jmp 0x08:flush_cs ; specify cs when jumping to flush_cs
flush_cs:
; now we’ve changed cs to 0x08

FAR JUMP refers to our specifying a full 48-bit logical address jump: The segment selector to be used and the absolute address to be jumped. It will first set the CS to 0x08 and then use its absolute address to jump to FLUSH_CS.

Operations that affect segment registers

Beside CS, segment registers may be loaded with a general register (mov ds, ax) or with the top-of-stack (pop ds).

CS is the only Segment Register that cannot be directly altered. The only time (I’m sure I’m missing one) CS is altered is when the code switches execution into another segment. The only commands that can do this are:

Far Jump

Here the new value for CS is encoded in the jump instruction. Eg JMP 0x10:0x100 says to load CS with segment 0x10 and IP with 0x100. CS:IP is the logical address of the instruction to be executed.

Far Call

This is exactly the same as a far jump, but the current values of CS/IP are pushed onto the stack before executing at the new position.

INT

The processor reads the new value of CS/IP from the Interrupt Vector Table and then executes what is effectively a far call after pushing EFLAGS onto the stack.

Far Return

Here the processor pops the return segment/offset from the stack into CS/IP and switches execution to that address.

IRET

This is exactly the same as a far return apart from the processor popping EFLAGS off the stack in addition to CS/IP.

Apart from these cases no instruction alters the value of CS.

next article I will cover about interrupt and inputs

thanks

Abdullah M.R.M