Source for file position.php
Documentation is available at position.php
* Locate a byte index given a UTF-8 character index
* @version $Id: position.php,v 1.1 2007/09/09 20:39:51 pitlinz Exp $
//--------------------------------------------------------------------
* Given a string and a character index in the string, in
* terms of the UTF-8 character position, returns the byte
* index of that character. Can be useful when you want to
* PHP's native string functions but we warned, locating
* the byte can be expensive
* Takes variable number of parameters - first must be
* the search string then 1 to n UTF-8 character positions
* to obtain byte indexes for - it is more efficient to search
* the string for multiple characters at once, than make
* repeated calls to this function
* @author Chris Smith<chris@jalakai.co.uk>
* @param string string to locate index in
* @return mixed - int if only one input int, array if more
* @return boolean TRUE if it's all ASCII
// trivial byte index, character offset pair
// use a short piece of str to estimate bytes per character
// $i (& $j) -> byte indexes into $str
// $c -> character offset into $str
// deal with arguments from lowest to highest
foreach ($args as $offset) {
if ($offset ==
0) { $result[] =
0; continue; }
// ensure no endless looping
if ( ($c -
$prev[1]) ==
0 ) {
// Hack: gone past end of string
$j =
$i + (int)
(($offset-
$c) *
($i -
$prev[0]) /
($c -
$prev[1]));
// correct to utf8 character boundary
// save the index, offset for use next iteration
// determine new character offset
$error =
abs($c-
$offset);
// ready for next time around
// from 7 it is faster to iterate over the string
} while ( ($error >
7) && --
$safety_valve) ;
if ($error &&
$error <=
7) {
if ( count($result) ==
1 ) {
//--------------------------------------------------------------------
* Given a string and any byte index, returns the byte index
* of the start of the current UTF-8 character, relative to supplied
* position. If the current character begins at the same place as the
* supplied byte index, that byte index will be returned. Otherwise
* this function will step backwards, looking for the index where
* curent UTF-8 character begins
* @author Chris Smith<chris@jalakai.co.uk>
* @param int byte index in the string
* @return int byte index of start of next UTF-8 character
if ($idx >=
$limit) return $limit;
// Binary value for any byte after the first in a multi-byte UTF-8 character
// will be like 10xxxxxx so & 0xC0 can be used to detect this kind
// of byte - assuming well formed UTF-8
while ($idx &&
((ord($str[$idx]) & 0xC0) ==
0x80)) $idx--
;
//--------------------------------------------------------------------
* Given a string and any byte index, returns the byte index
* of the start of the next UTF-8 character, relative to supplied
* position. If the next character begins at the same place as the
* supplied byte index, that byte index will be returned.
* @author Chris Smith<chris@jalakai.co.uk>
* @param int byte index in the string
* @return int byte index of start of next UTF-8 character
if ($idx >=
$limit) return $limit;
// Binary value for any byte after the first in a multi-byte UTF-8 character
// will be like 10xxxxxx so & 0xC0 can be used to detect this kind
// of byte - assuming well formed UTF-8
while (($idx <
$limit) &&
((ord($str[$idx]) & 0xC0) ==
0x80)) $idx++
;
Documentation generated on Thu, 08 Jan 2009 17:47:58 +0100 by phpDocumentor 1.4.0a2