You say "boot straight into windows" so I assume you are using a physical PC. Future note to make: Always use an emulator for development! It's just easier. I like Bochs for OSDeving cause it has nice debugging features. Now, onto the possible solution.
There are a lot of buggy BIOSes that break the informal specifications of the IBM PC for the 0x7C00 load address.
This can give a lot of problems with memory addresses and such whenever you are assembling. So make the beginning look like this:
[BITS 16] ;tell the assembler that its a 16 bit code
[ORG 0x7C00] ;this tells the assembler where the code will be loaded at when it runs on your machine. It uses this to compute the absolute addresses of labels and such.
jmp word 0:flush ;#FAR jump so that you set CS to 0. (the first argument is what segment to jump to. The argument(after the `:`) is what offset to jump to)
;# Without the far jmp, CS could be `0x7C0` or something similar, which will means that where the assembler thinks the code is loaded and where your computer loaded the code is different. Which in turn messes up the absolute addresses of labels.
flush: ;#We go to here, but we do it ABSOLUTE. So with this, we can reset the segment and offset of where our code is loaded.
mov BP,0 ;#use BP as a temp register
mov DS,BP ;#can not assign segment registers a literal number. You have to assign to a register first.
mov ES,BP ;#do the same here too
;#without setting DS and ES, they could have been loaded with the old 0x7C0, which would mess up absolute address calculations for data.
See, some load at 0x07C0:0000
and most load(and its considered proper to) at 0x0000:7C00
. It is the same flat address, but the different segment settings can really screw up absolute memory addresses. So let's remove the "magic" of the assembler and see what it looks like (note I don't guarantee addresses to be completely correct with this. I don't know the size of all opcodes)
jmp word 0:0x7C04 ;# 0x7C04 is the address of the `flush` label
...
So, we jump to an absolute address.
Now then. What happens when we don't do this?
take this program for example:
mov ax,[mydata]
hlt
mydata: dw 500 ;#just some data
This disassembles to something like
mov ax,[0x7C06]
Oh, well it uses absolute addressing, so how could that go wrong? Well, what if DS is actually 0x7C0
? then instead of getting the assembler expected 0:0x7C06
it will get 0x7C0:0x7C06
which are not the same flat address.
I hope this helps you to understand. It's really a complicated topic though and takes a while of low level programming to fully understand.