1 unstable release

0.2.0 Jun 30, 2022

#36 in #charset

Download history 69/week @ 2024-07-28 63/week @ 2024-08-04 28/week @ 2024-08-11 66/week @ 2024-08-18 38/week @ 2024-08-25 39/week @ 2024-09-01 40/week @ 2024-09-08 27/week @ 2024-09-15 59/week @ 2024-09-22 22/week @ 2024-09-29 59/week @ 2024-10-06 66/week @ 2024-10-13 35/week @ 2024-10-20 43/week @ 2024-10-27 55/week @ 2024-11-03 37/week @ 2024-11-10

173 downloads per month
Used in encoding-next

MIT license

20KB
266 lines

  • Interface to the character encoding.
  • Raw incremental interface

  • Methods which name starts with raw_ constitute the raw incremental interface,
  • the lowest-available API for encoders and decoders.
  • This interface divides the entire input to four parts:
    • Processed bytes do not affect the future result.
    • Unprocessed bytes may affect the future result
  • and can be a part of problematic sequence according to the future input.
    • Problematic byte is the first byte that causes an error condition.
    • Remaining bytes are not yet processed nor read,
  • so the caller should feed any remaining bytes again.
  • The following figure illustrates an example of successive raw_feed calls:
  • 1st raw_feed :2nd raw_feed :3rd raw_feed
  • ----------+----:---------------:--+--+---------
  •       |    :               :  |  |
    
  • ----------+----:---------------:--+--+---------
  • processed unprocessed | remaining
  •                           problematic
    
  • Since these parts can span the multiple input sequences to raw_feed,
  • raw_feed returns two offsets (one optional)
  • with that the caller can track the problematic sequence.
  • The first offset (the first usize in the tuple) points to the first unprocessed bytes,
  • or is zero when unprocessed bytes have started before the current call.
  • (The first unprocessed byte can also be at offset 0,
  • which doesn't make a difference for the caller.)
  • The second offset (upto field in the CodecError struct), if any,
  • points to the first remaining bytes.
  • If the caller needs to recover the error via the problematic sequence,
  • then the caller starts to save the unprocessed bytes when the first offset < the input length,
  • appends any new unprocessed bytes while the first offset is zero,
  • and discards unprocessed bytes when first offset becomes non-zero
  • while saving new unprocessed bytes when the first offset < the input length.
  • Then the caller checks for the error condition
  • and can use the saved unprocessed bytes for error recovery.
  • Alternatively, if the caller only wants to replace the problematic sequence
  • with a fixed string (like U+FFFD),
  • then it can just discard the first sequence and can emit the fixed string on an error.
  • It still has to feed the input bytes starting at the second offset again.

No runtime deps