1 stable release
1.0.0 | Nov 3, 2024 |
---|
#499 in Text processing
26KB
989 lines
dbxcase
This is an implementation of text case-folding which matches how Dropbox handles file paths.
Dropbox was originally implemented using Python 2.5 (the current version at the time) and used its
unicode.lower()
function to compare paths case-insensitively. Python 2.5 is long gone, but its
behavior of this function has been preserved to maintain backwards compatibility.
Python 2.5's case-folding is based on Unicode 4.1.0's character database, but does not implement the case-folding algorithm recommended. Instead, it simply applies the "simple lowercase mapping" which is a 1:1 character mapping and does not take any context into account. And of course, it lacks many characters added since 2003.
As a result, it differs in several ways from any modern to_lowercase()
function like the one
included in the Rust standard library. These differences are important if proper interoperation
with the Dropbox API is desired.
lib.rs
:
This crate implements the case-folding rules used by Dropbox for file paths.
It's a recreation of what Python 2.5's unicode.lower() did (which was the current version of Python at the time of Dropbox's founding).
For every character in the Unicode 4.1 character database which has a "simple lowercase mapping" property, it replaces it with the corresponding character.
This is different from a proper lowercasing, where at least one upper-case codepoint (U+0130, "Latin Capital Letter I with Dot Above") maps to two lower-case codepoints. It also uses a very old version of Unicode which lacks many characters added since 2003.
The mapping is hardcoded, but the code can be regenerated manually from the Unicode database using an included program in the codebase.
Dependencies
~0–600KB
~11K SLoC