What address in the processors memory space where it starts executing is described in the chip/processor documentation. The process of starting execution at that address is true for that chip no matter what the size or purpose of the board it is soldered to. Embedded, single board computer, laptop motherboard, desktop motherboard, toaster, etc.
The schematic for the board will show where power on reset goes or comes from. If power on reset is managed by a programmable device then you may not have access to all the gory details to figure out what is going on, but for an embedded development board the vendor should have provided enough timing information for you to do your job. Some devices like flash, may require a minimum period of time from power reaching some percentage of full power (say 75% of 3.3v for example) and the time when the first read can occur. Or some devices require X number of clock cycles on the clock input before reset is released, that sort of thing. Fpga's and other similar devices that load their hardware designs from a prom of some sort, need a clock, a power on reset, a period of time to load the design, etc. And that fpga may be the memory controller for the processor for example, so before you can boot the processor you may need that fpga up and running to route your flash or ram requests, you might go so far as to have that fpga initialize dram so that it looks like sram to the processor. The processor reset would have to be held off until all that happens. Some voltage regulators or other devices have a power on good output that indicate power is regulated to the desired voltage and ready to use. A device that manages the reset may wait for that power on good before it releases reset on the processor. Where I am headed with all of this is a number of things have to happen before a processor can boot.
Once the chip/processors reset is released, it is as described in the other answers.
Depending on the processor there may be strap options on the pins of the processor that describe the boot configuration, which can affect the starting address for execution, which would be described in the processors documentation. The processor might always boot from a known address, or a table of addresses (vector table, exception table, etc, goes by many names) based on the event that happened, reset, interrupt, wakeup from low power mode/sleep, etc. The processor/chip documentation will describe the bootup process and/or how you figure out what or where the first instructions are that execute.
Once the processor finally starts to execute instructions, then depending on the system you might need to enable a usb interface so a host can enumerate the device, you may need to enable a pci interface quickly so that a host may enumerate the device/board. If you have dram you likely have to initialize a dram controller on chip or off, somewhere between the processor and the dram. If that dram or other memories have some sort of error detection or correction you likely have to initialize that memory to initialize the ECC tags. That ram has to be up and running before you can use it for your stack or to initialize .bss and .data. Some devices boot with peripherals inside the chip disabled (like rs232 ports, usb, etc) and depending on your application you may wish to just turn them on in the boot code before main() is called. There may be leds on the board indicating certain things, sometimes code is there for that.
If your compiler generates .data or .bss segments, which are a software thing BTW, the hardware has no notion of this, most programmers would prefer that those sections of memory are initialized (no doubt specified in a standard for the language). For .bss that means zero it out, for .data that means fill it in with initial values. This is usually for C or other high level languages, for assembler you might be managing this yourself, the linker may not care if the objects being linked were C or assembler and may create .data, .bss, etc sections of memory independent of the source language.
for example:
const int a=27;
int b=28;
int c;
The variable a would be placed in the executable code or .text segment. The variable b would be placed in .data because before main() is called you as a programmer would expect that memory location to contain that value. And c would be in the .bss segment because as a programmer you would hope/expect that memory location to be zero before main() is called.
The startup code also initializes the stack pointer or pointers as the case may be, it may enable the cache and do various other things.
A lot of things CAN happen between reset and main(), and it seems like we assume that is the root of your question.
Since you mention embedded you may wish to take over the startup code. The compiler tools will usually have startup code for the target processor, but the odds of it matching your embedded processors memory map is unlikely, at least when using a generic compiler like gcc. For say a PIC c compiler or rabbit semiconductor or some other compiler that is specifically for a single family of chips, where you are required to specify the chip when compiling, well that tells the compiler the memory map so that it can manage initializing these areas. Gcc/binutils supports linker scripts and has many pre-built startup modules of code for various processors, depending on the nature of the target processor and the individual that created that open source code, you might be able to manipulate some of these things without having to write your own or modify the startup code. You may wish to just do this anyway as it might be easier to modify/write than to figure out all the knobs you have to turn to get the generic one to do what you want.
You may choose to make your startup code simpler and create programming rules for the project or at least the bootloader, that all variables must be initialized before use.
for example instead of:
const int a=27;
int b=28;
int c;
something like this:
const int a=27;
int b;
int c;
void embedded_main ( void )
{
b = 27;
c = 0;
...
}
If you do this th startup code no longer has to zero .bss or copy .data from rom to ram before calling embedded_main(). Note some compilers add extra (sometimes) unused code to the binary if it sees a function named main(), if you understand what the compiler is doing and what that code is and if you need it or not you can rename main() to anything else to avoid bloating your binary and consuming flash/rom.
Not initializing the variable c in the above before using it (assuming it to be zero) is bad form and some compilers warn you about using a possibly uninitialized variable. So you should be initializing it anyway.
You can get a pretty good performance gain on boot by not preparing .bss and .data and say for example if you are a PCI target you might need every millisecond you can get to get up and running before the host comes around to enumerate. You dont save any prom by initializing variables at runtime, the .data segment is replaced by code in the .text segment and depending on the processor it probably consumes more flash to do it this way. so you may gain performance and portability of the startup code, relief from a lot of headaches in that starup code and linker scripts, but if you are not the developer of the high level code you call you may get grief from those developers. it is a trade off and depends heavily on the system and what you plan to do with it.
My arm startup code often looks like this for example:
.globl _start
_start:
b reset
b hang
b hang
b hang
b hang
b hang
b hang
b hang
b hang
b hang
b hang
b hang
b hang
b hang
b hang
b hang
b hang
reset:
ldr sp,=0x2000C000
bl notmain
hang: b hang
for the cortex-m3 where the stack pointer is at address zero just before the exception vector table, then you could do something like this:
.cpu cortex-m3
.thumb
.word 0x40080000
.word _start
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
hang: b .
.thumb_func
.global _start
_start:
bl notmain
b hang
.end
Look in the source for newlib or glibc for files named crt0.S or Start.S to find boot code for the various processors supported (usually in assembler, and as a result there will be a file for each processor type supported by that compiler/library).