0xAX/linux-insides
Introduction
Right here’s the sixth a part of the Kernel booting process
sequence. Within the outdated part we now have considered the tip of the kernel boot process. But we now have skipped some valuable apt parts.
Because it’s doubtless you’ll possibly also be aware the entry level of the Linux kernel is the start_kernel
operate from the principal.c source code file began to manufacture at LOAD_PHYSICAL_ADDR
address. This address depends on the CONFIG_PHYSICAL_START
kernel configuration probability which is 0x1000000
by default:
config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EXPERT || CRASH_DUMP)
default "0x1000000"
---help---
This offers the physical address where the kernel is loaded.
...
...
...
This heed could well possibly also be modified at some stage in kernel configuration, however moreover load address could well be selected as random heed. For this reason the CONFIG_RANDOMIZE_BASE
kernel configuration probability desires to be enabled at some stage in kernel configuration.
On this case a physical address at which Linux kernel image could well be decompressed and loaded could well be randomized. This part considers the case when this probability is enabled and load address of the kernel image could well be randomized for safety reasons.
Initialization of page tables
Earlier than the kernel decompressor will originate to acquire random reminiscence range where the kernel could well be decompressed and loaded, the identification mapped page tables desires to be initialized. If a bootloader used Sixteen-bit or 32-bit boot protocol, we already have page tables. But despite all the pieces, we could well possibly also need original pages by effect a query to if the kernel decompressor selects reminiscence range outdoors of them. For this reason we now make a choice to manufacture original identification mapped page tables.
Sure, constructing of identification mapped page tables is the one of step one at some stage in randomization of load address. But before we are able to accept into consideration it, let’s strive to be aware where did we reach from to this level.
Within the outdated part, we noticed transition to lengthy mode and jump to the kernel decompressor entry level – extract_kernel
operate. The randomization stuff starts here from the decision of the:
void choose_random_location(unsigned lengthy input,
unsigned lengthy input_size,
unsigned lengthy *output,
unsigned lengthy output_size,
unsigned lengthy *virt_addr)
{}
operate. Because it’s doubtless you’ll possibly also honest gaze, this operate takes following 5 parameters:
-
input
; -
input_size
; -
output
; -
output_isze
; -
virt_addr
.
Let’s strive to achieve what these parameters are. The principal input
parameter got here from parameters of the extract_kernel
operate from the arch/x86/boot/compressed/misc.c source code file:
asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
unsigned char *input_data,
unsigned lengthy input_len,
unsigned char *output,
unsigned lengthy output_len)
{
...
...
...
choose_random_location((unsigned lengthy)input_data, input_len,
(unsigned lengthy *)&output,
max(output_len, kernel_total_size),
&virt_addr);
...
...
...
}
This parameter is handed from assembler code:
leaq input_data(%rip), %rdx
from the arch/x86/boot/compressed/head_64.S. The input_data
is generated by the little mkpiggy program. If you happen to could well possibly even have compiled linux kernel source code underneath your hands, it’s doubtless you’ll possibly also honest acquire the generated file by this program which desires to be positioned within the linux/arch/x86/boot/compressed/piggy.S
. In my case this file appears to be like:
.part ".rodata..compressed","a",@progbits
.globl z_input_len
z_input_len = 6988196
.globl z_output_len
z_output_len = 29207032
.globl input_data, input_data_end
input_data:
.incbin "arch/x86/boot/compressed/vmlinux.bin.gz"
input_data_end:
Because it’s doubtless you’ll possibly also honest gaze it comprises four global symbols. The principal two z_input_len
and z_output_len
that are sizes of compressed and uncompressed vmlinux.bin.gz
. The third is our input_data
and as it’s doubtless you’ll possibly also honest gaze it beneficial properties to linux kernel image in uncooked binary layout (all debugging symbols, comments and relocation records are stripped). And the closing input_data_end
beneficial properties to the tip of the compressed linux image.
So, our first parameter of the choose_random_location
operate is the pointer to the compressed kernel image that’s embedded into the piggy.o
object file.
The second parameter of the choose_random_location
operate is the z_input_len
that we now have considered correct now.
The third and fourth parameters of the choose_random_location
operate are address where to procedure decompressed kernel image and the length of decompressed kernel image respectively. The address where to effect decompressed kernel got here from arch/x86/boot/compressed/head_64.S and it’s address of the startup_32
aligned to 2 megabytes boundary. The size of the decompressed kernel got here from the identical piggy.S
and it’s z_output_len
.
The closing parameter of the choose_random_location
operate is the digital address of the kernel load address. As we could well possibly also honest gaze, by default it coincides with the default physical load address:
unsigned lengthy virt_addr = LOAD_PHYSICAL_ADDR;
which depends on kernel configuration:
#elaborate LOAD_PHYSICAL_ADDR ((CONFIG_PHYSICAL_START
+ (CONFIG_PHYSICAL_ALIGN - 1))
& ~(CONFIG_PHYSICAL_ALIGN - 1))
Now, as we opinion of parameters of the choose_random_location
operate, let’s study implementation of it. This operate starts from the checking of nokaslr
probability within the kernel present line:
if (cmdline_find_option_bool("nokaslr")) {
warn("KASLR disabled: 'nokaslr' on cmdline.");
return;
}
and if the choices was as soon as given we exit from the choose_random_location
operate ad kernel load address could well possibly also no longer be randomized. Connected present line choices could well be tell within the kernel documentation:
kaslr/nokaslr [X86]
Enable/disable kernel and module harmful offset ASLR
(Contend with Dilemma Structure Randomization) if built into
the kernel. When CONFIG_HIBERNATION is selected,
kASLR is disabled by default. When kASLR is enabled,
hibernation could well be disabled.
Let’s rob that we did no longer pass nokaslr
to the kernel present line and the CONFIG_RANDOMIZE_BASE
kernel configuration probability is enabled.
The next step is the decision of the:
initialize_identity_maps();
operate which is printed within the arch/x86/boot/compressed/pagetable.c source code file. This operate starts from initialization of mapping_info
an event of the x86_mapping_info
construction:
mapping_info.alloc_pgt_page = alloc_pgt_page;
mapping_info.context = &pgt_data;
mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sev_me_mask;
mapping_info.kernpg_flag = _KERNPG_TABLE | sev_me_mask;
The x86_mapping_info
construction is printed within the arch/x86/consist of/asm/init.h header file and looks:
struct x86_mapping_info {
void *(*alloc_pgt_page)(void *);
void *context;
unsigned lengthy page_flag;
unsigned lengthy offset;
bool direct_gbpages;
unsigned lengthy kernpg_flag;
};
This construction provides records about reminiscence mappings. Because it’s doubtless you’ll possibly also be aware from the outdated part, we already setup’ed initial page tables from 0 as a lot as 4G
. For now we could well possibly also must entry reminiscence above 4G
to load kernel at random procedure. So, the initialize_identity_maps
operate executes initialization of a reminiscence region for a that it’s doubtless you’ll possibly also factor in wished original page table. First of all let’s strive to appear at the definition of the x86_mapping_info
construction.
The alloc_pgt_page
is a callback operate that will be known as to allocate residence for a page table entry. The context
discipline is an event of the alloc_pgt_data
construction in our case which is ready to be used to song allocated page tables. The page_flag
and kernpg_flag
fields are page flags. The principal represents flags for PMD
or PUD
entries. The second kernpg_flag
discipline represents flags for kernel pages which is ready to be overridden later. The direct_gbpages
discipline represents toughen for gigantic pages and the closing offset
discipline represents offset between kernel digital addresses and physical addresses as a lot as PMD
stage.
The alloc_pgt_page
callback correct validates that there’s residence for a brand original page, allocates original page:
entry = pages->pgt_buf + pages->pgt_buf_offset;
pages->pgt_buf_offset += PAGE_SIZE;
within the buffer from the:
struct alloc_pgt_data {
unsigned char *pgt_buf;
unsigned lengthy pgt_buf_size;
unsigned lengthy pgt_buf_offset;
};
construction and returns address of a brand original page. The closing purpose of the initialize_identity_maps
operate is to initialize pgdt_buf_size
and pgt_buf_offset
. As we’re easiest in initialization part, the initialze_identity_maps
operate gadgets pgt_buf_offset
to zero:
pgt_data.pgt_buf_offset = 0;
and the pgt_data.pgt_buf_size
could well be procedure to 77824
or 69632
depends on which boot protocol could well be used by bootloader (sixty four-bit or 32-bit). The identical is for pgt_data.pgt_buf
. If a bootloader loaded the kernel at startup_32
, the pgdt_data.pgdt_buf
will cloak the tip of the page table which already was as soon as initialzed within the arch/x86/boot/compressed/head_64.S:
pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE;
where _pgtable
beneficial properties to the initiating of this page table _pgtable. In assorted technique, if a bootloader have used sixty four-bit boot protocol and loaded the kernel at startup_64
, early page tables desires to be built by bootloader itself and _pgtable
could well be correct overwrote:
pgt_data.pgt_buf = _pgtable
Because the buffer for ticket original page tables is initialized, we could well possibly also honest return help to the choose_random_location
operate.
Attach a ways flung from reserved reminiscence ranges
After the stuff connected to identification page tables is initilized, we could well possibly also honest originate to accept random procedure where to effect decompressed kernel image. But as it’s doubtless you’ll possibly also honest wager, we can’t accept any address. There are some reseved addresses in reminiscence ranges. Such addresses occupied by valuable issues, admire initrd, kernel present line and and heaps others. The
mem_avoid_init(input, input_size, *output);
operate will help us to enact this. All non-staunch reminiscence regions could well be smooth within the:
struct mem_vector {
unsigned lengthy lengthy originate;
unsigned lengthy lengthy size;
};
static struct mem_vector mem_avoid[MEM_AVOID_MAX];
array. The set aside MEM_AVOID_MAX
is from mem_avoid_index
enum which represents assorted forms of reserved reminiscence regions:
enum mem_avoid_index {
MEM_AVOID_ZO_RANGE = 0,
MEM_AVOID_INITRD,
MEM_AVOID_CMDLINE,
MEM_AVOID_BOOTPARAMS,
MEM_AVOID_MEMMAP_BEGIN,
MEM_AVOID_MEMMAP_END = MEM_AVOID_MEMMAP_BEGIN + MAX_MEMMAP_REGIONS - 1,
MEM_AVOID_MAX,
};
Both are outlined within the arch/x86/boot/compressed/kaslr.c source code file.
Let’s study the implementation of the mem_avoid_init
operate. The predominant purpose of this operate is to retailer records about reseved reminiscence regions described by the mem_avoid_index
enum within the mem_avoid
array and manufacture original pages for such regions in our original identification mapped buffer. A host of parts fo the mem_avoid_index
operate are identical, however let’s accept a study the one of them:
mem_avoid[MEM_AVOID_ZO_RANGE].originate = input;
mem_avoid[MEM_AVOID_ZO_RANGE].size = (output + init_size) - input;
add_identity_map(mem_avoid[MEM_AVOID_ZO_RANGE].originate,
mem_avoid[MEM_AVOID_ZO_RANGE].size);
At the initiating of the mem_avoid_init
operate tries to withhold a ways flung from reminiscence region that’s used for new kernel decompression. We contain an entry from the mem_avoid
array with the originate and size of such region and make contact with the add_identity_map
operate which ought to peaceful manufacture identification mapped pages for this region. The add_identity_map
operate is printed within the arch/x86/boot/compressed/kaslr.c source code file and looks:
void add_identity_map(unsigned lengthy originate, unsigned lengthy size)
{
unsigned lengthy discontinue = originate + size;
originate = round_down(originate, PMD_SIZE);
discontinue = round_up(discontinue, PMD_SIZE);
if (originate >= discontinue)
return;
kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt,
originate, discontinue);
}
Because it’s doubtless you’ll possibly also honest gaze it aligns reminiscence region to 2 megabytes boundary and checks given originate and discontinue addresses.
Within the tip it correct calls the kernel_ident_mapping_init
operate from the arch/x86/mm/ident_map.c source code file and pass mapping_info
event that was as soon as initilized above, address of the tip stage page table and addresses of reminiscence region for which original identification mapping desires to be built.
The kernel_ident_mapping_init
operate gadgets default flags for ticket original pages within the event that they weren’t given:
if (!details->kernpg_flag)
details->kernpg_flag = _KERNPG_TABLE;
and starts to manufacture original 2-megabytes (because of PSE
bit within the mapping_info.page_flag
) page entries (PGD -> P4D -> PUD -> PMD
in a case of 5-stage page tables or PGD -> PUD -> PMD
in a case of four-stage page tables) connected to the given addresses.
for (; addr < discontinue; addr = next) {
p4d_t *p4d;
next = (addr & PGDIR_MASK) + PGDIR_SIZE;
if (next > discontinue)
next = discontinue;
p4d = (p4d_t *)details->alloc_pgt_page(details->context);
result = ident_p4d_init(details, p4d, addr, next);
return result;
}
First of all here we acquire next entry of the Web page World Directory
for the given address and if it’s increased than discontinue
of the given reminiscence region, we procedure it to discontinue
. After this we allocater a brand original page with our x86_mapping_info
callback that we already opinion of above and make contact with the ident_p4d_init
operate. The ident_p4d_init
operate will enact the identical, however for low-stage page directories (p4d
-> pud
-> pmd
).
That is all.
Unique page entries connected to reserved addresses are in our page tables. Right here’s no longer the tip of the mem_avoid_init
operate, however assorted parts are identical. It correct manufacture pages for initrd, kernel present line and and heaps others.
Now we could well possibly also honest return help to choose_random_location
operate.
Physical address randomization
After the reserved reminiscence regions were stored within the mem_avoid
array and identification mapping pages were built for them, we consume out minimal available address to accept random reminiscence region to decompress the kernel:
min_addr = min(*output, 512UL << 20);
Because it’s doubtless you’ll possibly also honest gaze it desires to be smaller than 512
megabytes. This 512
megabytes heed was as soon as selected correct to withhold a ways flung from unknown issues in lower reminiscence.
The next step is to consume out random physical and digital addresses to load kernel. The principal is physical addresses:
random_addr = find_random_phys_addr(min_addr, output_size);
The find_random_phys_addr
operate is printed within the identical source code file:
static unsigned lengthy find_random_phys_addr(unsigned lengthy minimal,
unsigned lengthy image_size)
{
minimal = ALIGN(minimal, CONFIG_PHYSICAL_ALIGN);
if (process_efi_entries(minimal, image_size))
return slots_fetch_random();
process_e820_entries(minimal, image_size);
return slots_fetch_random();
}
The predominant purpose of process_efi_entries
operate is to acquire all authorized reminiscence ranges in beefy accessible reminiscence to load kernel. If the kernel compiled and runned on the system without EFI toughen, we continue to search such reminiscence regions within the e820 regions. All founded reminiscence regions could well be stored within the
struct slot_area {
unsigned lengthy addr;
int num;
};
#elaborate MAX_SLOT_AREA 100
static struct slot_area slot_areas[MAX_SLOT_AREA];
array. The kernel decompressor ought to peaceful consume out random index of this array and this could well possibly also be random procedure where kernel could well be decompressed. This different could well be carried out by the slots_fetch_random
operate. The predominant purpose of the slots_fetch_random
operate is to consume out random reminiscence range from the slot_areas
array via kaslr_get_random_long
operate:
slot = kaslr_get_random_long("Physical") % slot_max;
The kaslr_get_random_long
operate is printed within the arch/x86/lib/kaslr.c source code file and it correct returns random number. Expose that the random number could well be glean via assorted programs depends on kernel configuration and system alternatives (consume out random number harmful on time label counter, rdrand etc).
That is all from this level random reminiscence range could well be selected.
Virtual address randomization
After random reminiscence region was as soon as selected by the kernel decompressor, original identification mapped pages could well be built for this region by effect a query to:
random_addr = find_random_phys_addr(min_addr, output_size);
if (*output != random_addr) {
add_identity_map(random_addr, output_size);
*output = random_addr;
}
From this time output
will retailer the harmful address of a reminiscence region where kernel could well be decompressed. But for this second, as it’s doubtless you’ll possibly also be aware we randomized easiest physical address. Virtual address desires to be randomized too in a case of x86_64 architecture:
if (IS_ENABLED(CONFIG_X86_64))
random_addr = find_random_virt_addr(LOAD_PHYSICAL_ADDR, output_size);
*virt_addr = random_addr;
Because it’s doubtless you’ll possibly also honest gaze in a case of non x86_64
architecture, randomzed digital address will coincide with randomized physical address. The find_random_virt_addr
operate calculates quantity of digital reminiscence ranges which will preserve kernel image and calls the kaslr_get_random_long
that we already noticed in a outdated case when we tried to acquire random physical
address.
From this second we now have every randomized harmful physical (*output
) and digital (*virt_addr
) addresses for decompressed kernel.
That is all.
Conclusion
Right here’s the tip of the sixth and the closing part about linux kernel booting process. We is no longer going to gaze posts about kernel booting anymore (possibly updates to this and outdated posts), however there could well be many posts about assorted kernel internals.
Next chapter could well be about kernel initialization and we are able to gaze the first steps within the Linux kernel initialization code.
If you happen to could well possibly even have any questions or concepts write me a observation or ping me in twitter.
Please cloak that English is no longer my first language, And I’m if truth be told sorry for any pain. If you happen to glimpse any mistakes please ship me PR to linux-insides.
Links
Learn Extra
Commentaires récents