2 unstable releases

0.2.0	Mar 31, 2021
0.0.10	Feb 19, 2020

#12 in #merging

MIT/Apache

520KB
650 lines

fractal-analysis

Various Rust and Futhark functions that help perform Z-Box Merging fractal analysis

`lib.rs`:

Welcome to the fractal-analysis crate! Here you will find necessary tools for the analysis of fractal sets, based on the “Z-box merging” algorithm.

How to use

I have tried to create ergonomic functions for the following data-sets:

Images, colour or gray-scale
Sounds (to be added later)
Electro-encephalograms (to be added later)
MRI images (to be added later)

For each of those, you will find ready-made functions that can give you useful results; just eg say example_code_here and you will get the result.

Caveat

Currently (March 2021) the documentation and testing are still unfinished. They are both provided on a best-effort basis; the user's understanding is appreciated.

How to extend to other uses

Oh, how I wish there was a way for me to design an API that said “Oh, just do zbox_merge(input) for whatever input and it'll work!”.

Sadly, it is no-where nearly as simple as that. The basic library will get you a long way along, but there is serious work you'll need to do before-hand. The simplest way would be to just send the crate maintainer an e-mail and ask directly, but in case s/he is unreachable, here is what you'll need to do:

Choose your dependent and independent dimensions

First, decide what counts as an “separate dimension” for your use-case. Is each channel of your EEG a separate dimension that depends on the time-stamp, or are they a sampling of a two-dimensional scalp? What about the colour channels of your image? That will give you a first idea of what to do, and denote the limits of the result you will end up with.

Of particular note: You will want the number of dimensions to be independent of the measurement method. For instance: For the colour channels, we chose each to be a separate dimension in part because all colour images have those exact same 3 channels. But for the EEGs we chose to consider them as a 2-D grid, or else the result would depend on the number of electrodes and measurements between different setups would have no immediate way to be compared. As a result, about half the mental effort was spent in deciding exactly how to subdivide the electrode placement into quadrants!

Count the number of bits you have available

For each coordinate, count the amount of values it can take. Take the smallest of those amounts, find its base-2 logarithm, and throw away all the decimal digits. The result is the number of bits to which each of your coordinates will need to be normalised.

Choose your data-types

This is only important if you really care about efficiency. The default choices ought to be optimal for most cases, but under special circumstances they might not be.

Example 1: If you have 5 keys of 12 bits each, each Coor will be an u16, and as a result the default Key will be an u128. But they can actually fit within an u64; the compiler just doesn't know that. In that case, you might prefer to explicitly cast all keys to u64s.

Example 2: If you need 48 bits for your key, and you're running the code on a 16-bit architecture CPU, you might prefer to implement custom-made 48 bit data-types (3*16) instead of letting the code use u64s by default. The lindel crate permits that.

Create a “key-extraction” function

The key-extraction is comprised of two parts: Normalisation and linearisation.

The normalisation is already provided for you, and only requires you to provide, for each co-ordinate, the min/max values and the amount of bits to which they will be normalised. However, please bear in mind that you will need to extract the independent coordinates together with the dependent ones!

As for the linearisation, two separate methods are already provided, thanks to the lindel crate. Those are Morton encoding (“z-index”) and Hilbert encoding (“Hilbert index”). If you care about speed and/or have at least one independent coordinate, the z-index will serve you better. If you have no independent coordinates, and you can afford the small performance hit, a Hilbert index will easily allow you to examine almost every possible subdivision of your data-set independently. Still, please note the following:

You will need to do that manually, as the program can't perform this operation automatically.
You could theoretically use a Hilbert index everywhere, but in subdividing it you might end up omitting parts of the independent coordinates whose values are too large.
While a z-index is much quicker to compute than a Hilbert index, for large data-sets the most expensive operation is the sorting, so perhaps the difference won't be important for the O(N) part of the program.

Choose a sorting algorithm

If you don't have a great concern for speed, you could choose the default methods defined in std or rayon. If you need to squeeze every last bit of performance, on the other hand, you might prefer to implement a radix sort or something to that extent.

And you're finished!

…finished with the preliminary work, that is. Now all that's left is to write the actual code: extract the keys, sort, extract the amounts of common leading bits between each consecutive pair, get the results from them, drain the useless bits, and use a regression function to get the result. Here is some example code, copied from the img_funs module:

let width = input.width() as usize;
let height = input.height() as usize;

let get_key_from_sample = |(x, y, rgb): (u32, u32, &image::Rgb<u8>)| -> u64 {
   let norm_x = create_normaliser::<u32, u8>(0, input.width()).unwrap();
   // The provided normaliser simplifies things slightly.
   let norm_y = create_normaliser::<u32, u8>(0, input.height()).unwrap();
   let arr = [norm_x(x), norm_y(y), rgb[0], rgb[1], rgb[2]];
   // Extract both independent and dependent coordinates
   morton_encoding::morton_encode(arr)
   // Linearise using `morton_encoding`
};

let clzs = get_clzs(input.enumerate_pixels(), get_key_from_sample);
let (tmp, lacun) = get_results_from_clzs(clzs);
let tmp = drain_useless_bits::<_, 64, 40>(tmp);
let lacun = drain_useless_bits::<_, 64, 40>(lacun);
finalise_results(tmp, lacun, width*height, 8)

Unless of course you need more than 256 bits for each key, in which case you'll need to change the rest of the functions so that they operate on u16s instead.

Interpreting the results

Fractal dimension

The first and most important thing to understand with regards to the fractal dimension is the limits inside which its log-log plot will fall. Those create a quadrilateral, whose four edges are defined as follows:

Edge 1 is vertical, for x equal to the amount of bits you have available. You can't subdivide your data-set more then the resolution that each coordinate has.
Edge 2 is horizontal, for y equal to the amount of samples you have available. Subdivide all you might, you will never get more populated boxes than the amount of samples (eg pixels) that exist in the data-set.
Edge 3 is diagonal, and has a slope equal to the amount of independent coordinates, if any.
Edge 4 is diagonal as well; its slope is equal to the total amount of coordinates available, independent as well as dependent. A data-set constrained within N dimensions can never have a fractal dimension above N.

The most immediate result of these four constraints is that, especially if the log-log plot rises abruptly at first, it will encounter a plateau as soon as it reaches the total amount of samples, past which it will be horizontal. As such, the horizontal part of the plot must be excised before calculating the fractal dimension, else it will be artificially low.

The second thing to understand is that, although an ideally fractal shape's log-log plot will be linear, in practice data-sets will be scale-dependent, leading to non-linearities in the log-log plot. In every such instance we've found, the log-log plot is concave downward. Therefore, the user has two choices:

To take the simple linear regression of the log-log plot, and interpret its slope as the fractal dimension. The scale-dependence, if any, may be found from the mean square error between the line and the actual plot.
To take a parabolic regression of the log-log plot, interpret the linear parametre as the fractal dimension, and the second-order parametre as the scale-dependence.

Neither has been tried with any real rigour; the user is cordially invited to share the results, if any.

Lacunarity

The way the lacunarity was defined in the available literature, it appears to be a measure that's different for each scale. It's measured here for the user's convenience, but the interpretation of the results must necessarily be left up to the user.

Dependencies

~2.5MB
~51K SLoC