UtfNormal Class Reference
[UtfNormal]

Unicode normalization routines for working with UTF-8 strings. More...

List of all members.

Static Public Member Functions

static cleanUp ($string)
 The ultimate convenience function! Clean up invalid UTF-8 sequences, and convert to normal form C, canonical composition.
static toNFC ($string)
 Convert a UTF-8 string to normal form C, canonical composition.
static toNFD ($string)
 Convert a UTF-8 string to normal form D, canonical decomposition.
static toNFKC ($string)
 Convert a UTF-8 string to normal form KC, compatibility composition.
static toNFKD ($string)
 Convert a UTF-8 string to normal form KD, compatibility decomposition.
static quickIsNFC ($string)
 Returns true if the string is _definitely_ in NFC.
static quickIsNFCVerify (&$string)
 Returns true if the string is _definitely_ in NFC.
static NFC ($string)
static NFD ($string)
static NFKC ($string)
static NFKD ($string)
static fastDecompose ($string, $map)
 Perform decomposition of a UTF-8 string into either D or KD form (depending on which decomposition map is passed to us).
static fastCombiningSort ($string)
 Sorts combining characters into canonical order.
static fastCompose ($string)
 Produces canonically composed sequences, i.e.
static placebo ($string)
 This is just used for the benchmark, comparing how long it takes to interate through a string without really doing anything of substance.

Static Private Member Functions

static loadData ()
 Load the basic composition data if necessary.


Detailed Description

Unicode normalization routines for working with UTF-8 strings.

Currently assumes that input strings are valid UTF-8!

Not as fast as I'd like, but should be usable for most purposes. UtfNormal::toNFC() will bail early if given ASCII text or text it can quickly deterimine is already normalized.

All functions can be called static.

See description of forms at http://www.unicode.org/reports/tr15/

Definition at line 63 of file UtfNormal.php.


Member Function Documentation

static UtfNormal::cleanUp ( string  )  [static]

static UtfNormal::fastCombiningSort ( string  )  [static]

Sorts combining characters into canonical order.

This is the final step in creating decomposed normal forms D and KD.

Access:
private
Parameters:
$string String: a valid, decomposed UTF-8 string. Input is not validated.
Returns:
string a UTF-8 string with combining characters sorted in canonical order

Definition at line 547 of file UtfNormal.php.

References $i, $n, $out, $utfCombiningClass, and loadData().

Referenced by NFD(), and NFKD().

static UtfNormal::fastCompose ( string  )  [static]

Produces canonically composed sequences, i.e.

normal form C or KC.

Access:
private
Parameters:
$string String: a valid UTF-8 string in sorted normal form D or KD. Input is not validated.
Returns:
string a UTF-8 string with canonical precomposed characters used where possible

Definition at line 600 of file UtfNormal.php.

References $i, $n, $out, $utfCanonicalComp, $utfCombiningClass, and loadData().

Referenced by NFC(), and NFKC().

static UtfNormal::fastDecompose ( string,
map 
) [static]

Perform decomposition of a UTF-8 string into either D or KD form (depending on which decomposition map is passed to us).

Input is assumed to be *valid* UTF-8. Invalid code will break.

Access:
private
Parameters:
$string String: valid UTF-8 string
$map Array: hash of expanded decomposition map
Returns:
string a UTF-8 string decomposed, not yet normalized (needs sorting)

Definition at line 487 of file UtfNormal.php.

References $i, $n, $out, $t, and loadData().

Referenced by NFD(), and NFKD().

static UtfNormal::loadData (  )  [static, private]

Load the basic composition data if necessary.

Definition at line 166 of file UtfNormal.php.

References $utfCombiningClass.

Referenced by fastCombiningSort(), fastCompose(), fastDecompose(), NFD(), quickIsNFC(), and quickIsNFCVerify().

static UtfNormal::NFC ( string  )  [static]

Parameters:
$string string
Returns:
string
Access:
private

Definition at line 438 of file UtfNormal.php.

References fastCompose(), and NFD().

Referenced by cleanUp(), CleanUpTest::doTestDoubleBytes(), CleanUpTest::doTestTripleBytes(), toNFC(), and CleanUpTest::XtestAllChars().

static UtfNormal::NFD ( string  )  [static]

Parameters:
$string string
Returns:
string
Access:
private

Definition at line 447 of file UtfNormal.php.

References $utfCanonicalDecomp, fastCombiningSort(), fastDecompose(), and loadData().

Referenced by NFC(), and toNFD().

static UtfNormal::NFKC ( string  )  [static]

Parameters:
$string string
Returns:
string
Access:
private

Definition at line 459 of file UtfNormal.php.

References fastCompose(), and NFKD().

Referenced by toNFKC().

static UtfNormal::NFKD ( string  )  [static]

Parameters:
$string string
Returns:
string
Access:
private

Definition at line 468 of file UtfNormal.php.

References $utfCompatibilityDecomp, fastCombiningSort(), and fastDecompose().

Referenced by NFKC(), and toNFKD().

static UtfNormal::placebo ( string  )  [static]

This is just used for the benchmark, comparing how long it takes to interate through a string without really doing anything of substance.

Parameters:
$string string
Returns:
string

Definition at line 732 of file UtfNormal.php.

References $i, and $out.

static UtfNormal::quickIsNFC ( string  )  [static]

Returns true if the string is _definitely_ in NFC.

Returns false if not or uncertain.

Parameters:
$string String: a valid UTF-8 string. Input is not validated.
Returns:
bool

Definition at line 179 of file UtfNormal.php.

References $i, $n, $utfCheckNFC, $utfCombiningClass, and loadData().

Referenced by toNFC().

static UtfNormal::quickIsNFCVerify ( &$  string  )  [static]

Returns true if the string is _definitely_ in NFC.

Returns false if not or uncertain.

Parameters:
$string String: a UTF-8 string, altered on output to be valid UTF-8 safe for XML.

Definition at line 219 of file UtfNormal.php.

References $i, $n, $utfCheckNFC, $utfCombiningClass, is(), and loadData().

Referenced by cleanUp().

static UtfNormal::toNFC ( string  )  [static]

Convert a UTF-8 string to normal form C, canonical composition.

Fast return for pure ASCII strings; some lesser optimizations for strings containing only known-good characters.

Parameters:
$string String: a valid UTF-8 string. Input is not validated.
Returns:
string a UTF-8 string in normal form C

Definition at line 103 of file UtfNormal.php.

References NFC(), and quickIsNFC().

static UtfNormal::toNFD ( string  )  [static]

Convert a UTF-8 string to normal form D, canonical decomposition.

Fast return for pure ASCII strings.

Parameters:
$string String: a valid UTF-8 string. Input is not validated.
Returns:
string a UTF-8 string in normal form D

Definition at line 119 of file UtfNormal.php.

References NFD().

static UtfNormal::toNFKC ( string  )  [static]

Convert a UTF-8 string to normal form KC, compatibility composition.

This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.

Parameters:
$string String: a valid UTF-8 string. Input is not validated.
Returns:
string a UTF-8 string in normal form KC

Definition at line 136 of file UtfNormal.php.

References NFKC().

static UtfNormal::toNFKD ( string  )  [static]

Convert a UTF-8 string to normal form KD, compatibility decomposition.

This may cause irreversible information loss, use judiciously. Fast return for pure ASCII strings.

Parameters:
$string String: a valid UTF-8 string. Input is not validated.
Returns:
string a UTF-8 string in normal form KD

Definition at line 153 of file UtfNormal.php.

References NFKD().


The documentation for this class was generated from the following file:

Generated on Sat Sep 5 02:08:50 2009 for MediaWiki by  doxygen 1.5.9