The word break rule file | ansaurus

tags:

icu
java

views:

218

answers:

1

Q:

The word break rule file

IBM has apparently open-sourced their ICU source code for Unicode and Globalization support, part of which is a text boundary locator for detecting where breaks can be located in text.

However, the break detection stuff relies on rules and I cannot locate the rules files anywhere.

Where can I get the word break rules text files for com.ibm.icu.text.BreakIterator and com.ibm.icu.text.RuleBasedBreakIterator?

+2 A:

http://www.icu-project.org/ holds all the source code for icu4j which IBM has released under an open source license. This includes the boundary analysis stuff like dictionary- and rule-based break iterators.

However, there doesn't appear to be a text file suitable for perusing. I not sure that IBM would have released their rule set as open source (since it's a pretty big technological advantage to them). Instead, the idea is to create your own rule set, a tutorial of which is here.

That same tutorial states that you can dump the default rules by running:

RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator)
    BreakIterator.getWordInstance(Locale.getDefault());
String defaultRules = rbbi.toString();

paxdiablo 2009-02-18 06:52:31

related questions

Java Time Zone is messed up

Eclipse on win64

Automate builds for Java RCP for deployment with JNLP

Why are professors or schools picking Java over C++ to teach to students?

Is there a real benefit of using J#?

Public/Popular Websites using JavaServer Faces

Why can't I use a try block around my super() call?

Accessing post variables using Java Servlets

Personal Linux web server

Is this really widening vs autoboxing?

How can I Java webstart multiple, dependent, native libraries?

Why can't I call toString() on a Java primitive?

How do I use Java to read from a file that is actively being written?

What code analysis tools do you use for your Java projects?

IllegalArgumentException or NullPointerException for a null parameter?

How do I configure and communicate with a serial port?

What is the best way to parse strings in Java

Getting started with a custom JXTA PeerGroup

Creating a custom button in Java

How to get started "writing" a code coverage tool?

Which Build-/Configuration Management Tool?

What is the difference between an int and an Integer in Java/C#?

What is the meaning of the type safety warning in certain Java generics casts?

How would you access Object properties from within an object method?

Converting CSV File to XML in Java