1 unstable release

0.2.0 Jun 30, 2022

#35 in #charset

Download history 71/week @ 2023-12-11 7/week @ 2023-12-18 1/week @ 2024-01-01 36/week @ 2024-01-08 39/week @ 2024-01-15 1/week @ 2024-01-22 27/week @ 2024-01-29 18/week @ 2024-02-05 26/week @ 2024-02-12 75/week @ 2024-02-19 74/week @ 2024-02-26 49/week @ 2024-03-04 19/week @ 2024-03-11 18/week @ 2024-03-18 25/week @ 2024-03-25

116 downloads per month
Used in encoding-next

MIT license

20KB
266 lines

  • Interface to the character encoding.
  • Raw incremental interface

  • Methods which name starts with raw_ constitute the raw incremental interface,
  • the lowest-available API for encoders and decoders.
  • This interface divides the entire input to four parts:
    • Processed bytes do not affect the future result.
    • Unprocessed bytes may affect the future result
  • and can be a part of problematic sequence according to the future input.
    • Problematic byte is the first byte that causes an error condition.
    • Remaining bytes are not yet processed nor read,
  • so the caller should feed any remaining bytes again.
  • The following figure illustrates an example of successive raw_feed calls:
  • 1st raw_feed :2nd raw_feed :3rd raw_feed
  • ----------+----:---------------:--+--+---------
  •       |    :               :  |  |
    
  • ----------+----:---------------:--+--+---------
  • processed unprocessed | remaining
  •                           problematic
    
  • Since these parts can span the multiple input sequences to raw_feed,
  • raw_feed returns two offsets (one optional)
  • with that the caller can track the problematic sequence.
  • The first offset (the first usize in the tuple) points to the first unprocessed bytes,
  • or is zero when unprocessed bytes have started before the current call.
  • (The first unprocessed byte can also be at offset 0,
  • which doesn't make a difference for the caller.)
  • The second offset (upto field in the CodecError struct), if any,
  • points to the first remaining bytes.
  • If the caller needs to recover the error via the problematic sequence,
  • then the caller starts to save the unprocessed bytes when the first offset < the input length,
  • appends any new unprocessed bytes while the first offset is zero,
  • and discards unprocessed bytes when first offset becomes non-zero
  • while saving new unprocessed bytes when the first offset < the input length.
  • Then the caller checks for the error condition
  • and can use the saved unprocessed bytes for error recovery.
  • Alternatively, if the caller only wants to replace the problematic sequence
  • with a fixed string (like U+FFFD),
  • then it can just discard the first sequence and can emit the fixed string on an error.
  • It still has to feed the input bytes starting at the second offset again.

No runtime deps