views:

252

answers:

4

Is there a way to determine if an instance of a org.apache.poi.hwpf.model.ListData belongs to a numbered list or bulleted list?

I am using Apache Poi's org.apache.poi.hwpf.HWPFDocument class to read the contents of a word document in order to generate HTML. I can identify the list items in the document by checking to see that the paragraph I am working with is an instance of org.apache.poi.hwpf.model.ListData. I can not find a way to determine if ListData belongs to a bulleted list or a numbered list.

A: 

I think I have found the answer to my own question.

ListEntry aListEntry = (ListEntry) aParagraph;
ListData listData = listTables.getListData(aListEntry.getIlfo());
int numberFormat = listData.getLevel(listData.numLevels()).getNumberFormat();

The number format returns 23 for bullet points and 0 for numbered lists. I dare say that there are multiple format numbers that can be interpreted as either bullet points or numbered lists but at least I can now identify them!

leighgordy
A: 

I lately posted another way to determine the list type. Unfortunately this way only worked for a few tests.

I now can confirm leighgorys way to determine the list type.

A: 

Hi to all, Now i am also facing the same problem while getting the bullets and numbers from document to html. I didnt find any perfect solution to solve the problem. Please help me out from this problem...

Thanks Jetti

jetti
A: 
enter code here

public class ListTest {

public static void main(String[] args) {

    String filename = "/some/path/to/ListTest.doc";

    try {

        POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream(filename));
        HWPFDocument doc = new HWPFDocument(fs);
        //Get a table of all the lists in this document
        ListTables listtables = doc.getListTables();
        Paragraph para;

        Range range = doc.getRange();
        for(int x=0; x<range.numParagraphs(); x++) {
            para = range.getParagraph(x);

           //When non-zero, (1-based) index into the pllfo
           //identifying the list to which the paragraph belongs
           if( para.getIlfo()!=0 ) {
                //Get the list this paragraph belongs to
                ListData listdata = listtables.getListData(para.getIlfo());
                //Now get all the levels for this list
                ListLevel[] listlevel = listdata.getLevels();
                //Find the list level info for our paragraph
                ListLevel level = listlevel[para.getIlvl()];
                System.out.print("Text: \"" + para.text() + "\"");
                //list level for this paragraph
                System.out.print("\tListLevel: " + para.getIlvl());
                //Additional text associated with list symbols
                System.out.print("\tgetNumberText: \"" + level.getNumberText() + "\"" );
                //Format value for the style of list symbols
                System.out.println("\tgetNumberFormat: " + level.getNumberFormat() );
            } else {
                System.out.println();
            }
        }
    } catch(Exception e) {
        e.printStackTrace();
    }
}

}

nfc value Numbering scheme

15 Single Byte character

16 Kanji numbering 3 (dbnum3).

17 Kanji numbering 4 (dbnum4).

18 Circle numbering (circlenum).

19 Double-byte Arabic numbering

20 46 phonetic double-byte Katakana characters (*aiueo*dbchar).

21 46 phonetic double-byte katakana characters (*iroha*dbchar).

22 Arabic with leading zero (01, 02, 03, ..., 10, 11)

23 Bullet (no number at all)

24 Korean numbering 2 (ganada).

25 Korean numbering 1 (chosung).

26 Chinese numbering 1 (gb1).

27 Chinese numbering 2 (gb2).

28 Chinese numbering 3 (gb3).

29 Chinese numbering 4 (gb4).

30 Chinese Zodiac numbering 1

31 Chinese Zodiac numbering 2

32 Chinese Zodiac numbering 3

33 Taiwanese double-byte numbering 1

34 Taiwanese double-byte numbering 2

35 Taiwanese double-byte numbering 3

36 Taiwanese double-byte numbering 4

37 Chinese double-byte numbering 1

38 Chinese double-byte numbering 2

39 Chinese double-byte numbering 3

40 Chinese double-byte numbering 4

41 Korean double-byte numbering 1

42 Korean double-byte numbering 2

43 Korean double-byte numbering 3

44 Korean double-byte numbering 4

45 Hebrew non-standard decimal

46 Arabic Alif Ba Tah

47 Hebrew Biblical standard

48 Arabic Abjad style

49 Hindi vowels

50 Hindi consonants

51 Hindi numbers

52 Hindi descriptive (cardinals)

53 Thai letters

54 Thai numbers

55 Thai descriptive (cardinals

56 Vietnamese descriptive (cardinals)

57 Page Number format - # -

58 Lower case Russian alphabet

Jamshid Asatillayev