Here is a snippet of the file /proc/self/smaps:
00af8000-00b14000 r-xp 00000000 fd:00 16417 /lib/ld-2.8.so
Size: 112 kB
Rss: 88 kB
Pss: 1 kB
Shared_Clean: 88 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 88 kB
Swap: 0 kB
00b14000-00b15000 r--p 0001c000 fd:00 16417 /lib/ld-2.8.so
Size: 4 kB
Rss: 4 kB
Pss: 4 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 4 kB
Referenced: 4 kB
Swap: 0 kB
It shows that this process (self) is linked to /lib/ld-2.8.so and two (of the many) byte ranges mapped into memory.
The first range of 88kb (22 4kb pages) is shared and clean, that is it has not been written to. This is probably code.
The second range of 4kb (a single page) is not shared and it is dirty -- the process has written to it since it was memory mapped from the file on disk. This is probably data.
But what is in that memory?
How do you convert the memory range 00b14000-00b15000 into useful information such as the line number of the file in which a large static structure is declared?
The technique will need to take account of prelinking and address space randomization, such as from execshield, and also separate debugging symbols.
(The motivation is to identify popular libraries which also create dirty memory and to fix them, for example by by declaring structures const).