Background
My board incorporated an STM32 microcontroller with an SD/MMC card on SPI and samples analogue data at 48ksps. I am using the Keil Real-time Library RTX kernel, and ELM FatFs
I have a high priority task that captures analogue data via DMA in blocks of 40 samples (40 x 16bit); the data is passed via a queue of length 128 (which constitutes about 107ms of sample buffering) to a second low priority task that collates sample blocks into a 2560 byte buffer (this being a multiple of both the 512 byte SD sector size and the 40 sample block size). when this buffer is full (32 blocks or approx 27ms), the data is written to the file system.
Observation
By instrumenting the code, I can see that every 32 blocks, the data is written and that the write takes about 6ms. This is sustained until (on FAT16) the file size gets to 1Mb, when the write operation takes 440ms, by which time the queue fills and logging is aborted. If I format the card as FAT32, the file size before the 'long-write' event is 4Mb.
The fact that the file size at which this occurs changes between FAT16 and FAT32 suggests to me that it is not a limitation of the card but rather something that the file system does at the 1Mb or 4Mb boundaries that takes additional time.
It also appears that my tasks are being scheduled in a timely manner, and that the time is consumed in the ELM FatFs code only at the 1Mb (or 4 for FAT32) boundary.
The question
Can anyone offer an explanation if not a solution? Is it a FAT issue, or rather specific to ELM's FatFs code perhaps?
I have considered using multiple files, but in my experience FAT does not handle large numbers of files in a single directory very well and this would simply fail also. Not using a file system at all and writing to the card raw would be a possibility, but ideally I'd like to read the data on a PC with standard drivers and no special software.
Added: It occurred to me to try compiler optimisations to get the write-time down; this seems to have an effect, but the write times seemed much more variable. At -O2 I did get a 8Mb file, but the results were inconsistent. I am now not sure whether there is a direct correlation between the file size and the point at which it fails; I have seen it fail in this way at various file lengths on no particular boundary. Maybe it is a card performance issue.
Added2: I further instrumented the code and applied a divide an conquer approach. This observation probably renders the question obsolete and all previous observations are erroneous or red-herrings. I finally narrowed it down to an instance a multi-sector write (CMD25) where occasionally the "wait ready" polling of the card takes 174ms for the first three sectors out of a block of 5. The timeout for wait ready is set to 500ms, so it would happily busy-wait for that long. Using CMD24 (single sector write) iteratively is much slower in the general case - 140ms per sector - rather than just occasionally. So it seems a behaviour of the card after all. I shall endeavour to try a range of cards SD and MMC.