2 unstable releases
0.2.0 | Mar 31, 2021 |
---|---|
0.0.10 | Feb 19, 2020 |
#12 in #merging
520KB
650 lines
fractal-analysis
Various Rust and Futhark functions that help perform Z-Box Merging fractal analysis
lib.rs
:
Welcome to the fractal-analysis
crate! Here you will find necessary tools
for the analysis of fractal sets, based on the “Z-box merging” algorithm.
How to use
I have tried to create ergonomic functions for the following data-sets:
- Images, colour or gray-scale
- Sounds (to be added later)
- Electro-encephalograms (to be added later)
- MRI images (to be added later)
For each of those, you will find ready-made functions that can give you
useful results; just eg say example_code_here
and you will get the result.
Caveat
Currently (March 2021) the documentation and testing are still unfinished. They are both provided on a best-effort basis; the user's understanding is appreciated.
How to extend to other uses
Oh, how I wish there was a way for me to design an API that said “Oh, just
do zbox_merge(input)
for whatever input and it'll work!”.
Sadly, it is no-where nearly as simple as that. The basic library will get you a long way along, but there is serious work you'll need to do before-hand. The simplest way would be to just send the crate maintainer an e-mail and ask directly, but in case s/he is unreachable, here is what you'll need to do:
Choose your dependent and independent dimensions
First, decide what counts as an “separate dimension” for your use-case. Is each channel of your EEG a separate dimension that depends on the time-stamp, or are they a sampling of a two-dimensional scalp? What about the colour channels of your image? That will give you a first idea of what to do, and denote the limits of the result you will end up with.
Of particular note: You will want the number of dimensions to be independent of the measurement method. For instance: For the colour channels, we chose each to be a separate dimension in part because all colour images have those exact same 3 channels. But for the EEGs we chose to consider them as a 2-D grid, or else the result would depend on the number of electrodes and measurements between different setups would have no immediate way to be compared. As a result, about half the mental effort was spent in deciding exactly how to subdivide the electrode placement into quadrants!
Count the number of bits you have available
For each coordinate, count the amount of values it can take. Take the smallest of those amounts, find its base-2 logarithm, and throw away all the decimal digits. The result is the number of bits to which each of your coordinates will need to be normalised.
Choose your data-types
This is only important if you really care about efficiency. The default choices ought to be optimal for most cases, but under special circumstances they might not be.
Example 1: If you have 5 keys of 12 bits each, each Coor
will be an
u16
, and as a result the default Key
will be an u128
. But they can
actually fit within an u64
; the compiler just doesn't know that. In that
case, you might prefer to explicitly cast all keys to u64
s.
Example 2: If you need 48 bits for your key, and you're running the
code on a 16-bit architecture CPU, you might prefer to implement
custom-made 48 bit data-types (3*16) instead of letting the code use u64
s
by default. The lindel
crate permits that.
Create a “key-extraction” function
The key-extraction is comprised of two parts: Normalisation and linearisation.
The normalisation is already provided for you, and only requires you to provide, for each co-ordinate, the min/max values and the amount of bits to which they will be normalised. However, please bear in mind that you will need to extract the independent coordinates together with the dependent ones!
As for the linearisation, two separate methods are already provided, thanks
to the lindel
crate. Those are Morton encoding (“z-index”) and Hilbert
encoding (“Hilbert index”). If you care about speed and/or have
at least one independent coordinate, the z-index will serve you better.
If you have no independent coordinates, and you can afford the small
performance hit, a Hilbert index will easily allow you to examine almost
every possible subdivision of your data-set independently. Still, please
note the following:
- You will need to do that manually, as the program can't perform this operation automatically.
- You could theoretically use a Hilbert index everywhere, but in subdividing it you might end up omitting parts of the independent coordinates whose values are too large.
- While a z-index is much quicker to compute than a Hilbert index, for
large data-sets the most expensive operation is the sorting, so perhaps the
difference won't be important for the
O(N)
part of the program.
Choose a sorting algorithm
If you don't have a great concern for speed, you could choose the default
methods defined in std
or rayon
. If you need to squeeze every last bit
of performance, on the other hand, you might prefer to implement a radix
sort or something to that extent.
And you're finished!
…finished with the preliminary work, that is. Now all that's left
is to write the actual code: extract the keys, sort, extract the
amounts of common leading bits between each consecutive pair,
get the results from them, drain the useless bits, and use a
regression function to get the result. Here is some example code,
copied from the img_funs
module:
let width = input.width() as usize;
let height = input.height() as usize;
let get_key_from_sample = |(x, y, rgb): (u32, u32, &image::Rgb<u8>)| -> u64 {
let norm_x = create_normaliser::<u32, u8>(0, input.width()).unwrap();
// The provided normaliser simplifies things slightly.
let norm_y = create_normaliser::<u32, u8>(0, input.height()).unwrap();
let arr = [norm_x(x), norm_y(y), rgb[0], rgb[1], rgb[2]];
// Extract both independent and dependent coordinates
morton_encoding::morton_encode(arr)
// Linearise using `morton_encoding`
};
let clzs = get_clzs(input.enumerate_pixels(), get_key_from_sample);
let (tmp, lacun) = get_results_from_clzs(clzs);
let tmp = drain_useless_bits::<_, 64, 40>(tmp);
let lacun = drain_useless_bits::<_, 64, 40>(lacun);
finalise_results(tmp, lacun, width*height, 8)
Unless of course you need more than 256 bits for each key, in which case
you'll need to change the rest of the functions so that they operate on
u16
s instead.
Interpreting the results
Fractal dimension
The first and most important thing to understand with regards to the fractal dimension is the limits inside which its log-log plot will fall. Those create a quadrilateral, whose four edges are defined as follows:
- Edge 1 is vertical, for
x
equal to the amount of bits you have available. You can't subdivide your data-set more then the resolution that each coordinate has. - Edge 2 is horizontal, for
y
equal to the amount of samples you have available. Subdivide all you might, you will never get more populated boxes than the amount of samples (eg pixels) that exist in the data-set. - Edge 3 is diagonal, and has a slope equal to the amount of independent coordinates, if any.
- Edge 4 is diagonal as well; its slope is equal to the total amount
of coordinates available, independent as well as dependent. A data-set
constrained within
N
dimensions can never have a fractal dimension aboveN
.
The most immediate result of these four constraints is that, especially if the log-log plot rises abruptly at first, it will encounter a plateau as soon as it reaches the total amount of samples, past which it will be horizontal. As such, the horizontal part of the plot must be excised before calculating the fractal dimension, else it will be artificially low.
The second thing to understand is that, although an ideally fractal shape's log-log plot will be linear, in practice data-sets will be scale-dependent, leading to non-linearities in the log-log plot. In every such instance we've found, the log-log plot is concave downward. Therefore, the user has two choices:
- To take the simple linear regression of the log-log plot, and interpret its slope as the fractal dimension. The scale-dependence, if any, may be found from the mean square error between the line and the actual plot.
- To take a parabolic regression of the log-log plot, interpret the linear parametre as the fractal dimension, and the second-order parametre as the scale-dependence.
Neither has been tried with any real rigour; the user is cordially invited to share the results, if any.
Lacunarity
The way the lacunarity was defined in the available literature, it appears to be a measure that's different for each scale. It's measured here for the user's convenience, but the interpretation of the results must necessarily be left up to the user.
Dependencies
~2.5MB
~51K SLoC