views:

180

answers:

1

Background

My board incorporated an STM32 microcontroller with an SD/MMC card on SPI and samples analogue data at 48ksps. I am using the Keil Real-time Library RTX kernel, and ELM FatFs

I have a high priority task that captures analogue data via DMA in blocks of 40 samples (40 x 16bit); the data is passed via a queue of length 128 (which constitutes about 107ms of sample buffering) to a second low priority task that collates sample blocks into a 2560 byte buffer (this being a multiple of both the 512 byte SD sector size and the 40 sample block size). when this buffer is full (32 blocks or approx 27ms), the data is written to the file system.

Observation

By instrumenting the code, I can see that every 32 blocks, the data is written and that the write takes about 6ms. This is sustained until (on FAT16) the file size gets to 1Mb, when the write operation takes 440ms, by which time the queue fills and logging is aborted. If I format the card as FAT32, the file size before the 'long-write' event is 4Mb.

The fact that the file size at which this occurs changes between FAT16 and FAT32 suggests to me that it is not a limitation of the card but rather something that the file system does at the 1Mb or 4Mb boundaries that takes additional time.

It also appears that my tasks are being scheduled in a timely manner, and that the time is consumed in the ELM FatFs code only at the 1Mb (or 4 for FAT32) boundary.

The question

Can anyone offer an explanation if not a solution? Is it a FAT issue, or rather specific to ELM's FatFs code perhaps?

I have considered using multiple files, but in my experience FAT does not handle large numbers of files in a single directory very well and this would simply fail also. Not using a file system at all and writing to the card raw would be a possibility, but ideally I'd like to read the data on a PC with standard drivers and no special software.

Added: It occurred to me to try compiler optimisations to get the write-time down; this seems to have an effect, but the write times seemed much more variable. At -O2 I did get a 8Mb file, but the results were inconsistent. I am now not sure whether there is a direct correlation between the file size and the point at which it fails; I have seen it fail in this way at various file lengths on no particular boundary. Maybe it is a card performance issue.

Added2: I further instrumented the code and applied a divide an conquer approach. This observation probably renders the question obsolete and all previous observations are erroneous or red-herrings. I finally narrowed it down to an instance a multi-sector write (CMD25) where occasionally the "wait ready" polling of the card takes 174ms for the first three sectors out of a block of 5. The timeout for wait ready is set to 500ms, so it would happily busy-wait for that long. Using CMD24 (single sector write) iteratively is much slower in the general case - 140ms per sector - rather than just occasionally. So it seems a behaviour of the card after all. I shall endeavour to try a range of cards SD and MMC.

+3  A: 

The first thing to try could be quite easy: increase the queue depth to 640. That would give you 535mS of buffering and should survive at least this particular FS event.

The second thing to look at is the configuration of the ELM FatFs. Many embedded file systems are very stingy with buffer usage by default. I've seen one that used a single 512 byte block buffer for all operations and it crawled for certion FS transactions. We gave it a couple of kB and the thing became orders of magnitude faster.

Both of the above are dependent on whether you have more RAM available, of course.

A third option would be to preallocate a huge file and then just overwrite the data during data collection. That would eliminate a number of expensive cluster allocation and FAT manipulation operations.

EDIT: Since compiler optimization affected this you must also consider the possibility that it is a multi-threading issue. Are there other threads running that could disturb the lower priority reader thread? You should also try changing the buffering there to something other than a multiple of the sample size and flash block size in case you're hitting some kind of system resonance.

Amardeep
Yes increasing the queue depth would be a solution - if only I have sufficient RAM to allow that! The part has 64Kb RAM, as 640x40x16bit queue would be 51Kb, and it is not the only thing running. I have increased it to 128 for this issue, I really need it to be much lower; a queue of 8 is sufficient until this extended write occurs. Tried option 3 already - no effect. Will look at option 2 and report. Thanks.
Clifford
Are you sure the #3 was done in a way that wouldn't require cluster reallocation? In other words, did you open the file for 'modify' instead of 'write'? Opening for 'write' would zero it and start cluster allocation all over again.
Amardeep
w.r.t. option 2 the options are to use a sector buffer per file or a shared sector buffer. I am using the former, but have only one file open in any case.
Clifford
In that case I'd look into another FAT file system implementation. That one is going to be a real system bottleneck.
Amardeep
For #3, yes, the file was always opened for update (which is why I erroneously thought it was *always* 1Mb - that was just the 'high-tide' mark.I have performed further tests which probably render all previous observations obsolete, I have added them to the original question.
Clifford
Re edit: I have moved off this problem now, but thinking about it now, there are other threads, and one of them could account for this. This is a team project, and the particular thread was written by someone else and I never really considered it to be a problem; it may be taking more time that I expected and it does run at a priority between the two. May revisit this; thanks, should of thought of that myself.
Clifford
Checked the 'third thread', it is never in the running state for more than 85 microseconds. So I am back to inherent card behaviour being the cause.
Clifford