2 unstable releases
0.2.0 | Dec 16, 2022 |
---|---|
0.1.0 | Nov 26, 2022 |
#2374 in Parser implementations
69KB
1.5K
SLoC
Unified Neuromorphic Datasets Repository
- Unified Neuromorphic Datasets Repository
Getting Started
Install the undr module
pip3 install undr
Generate a default configuration file
python3 -m undr init
The generated undr.toml file is written in TOML (https://github.com/toml-lang/toml). It lists the datasets that will be downloaded or streamed, hence it needs to be ajusted to your needs.
The line directory = 'datasets'
specifies the directory where downloaded files are stored (relatively to the configuration file). All the files generated by undr
(directory indexes, downloaded data, temporary files...) are stored in this directory.
Datasets are listed as [[datasets]]
entries with three mandatory properties: name
, url
and mode
. The optional server_type
property is used internally to speed up the download process. To discard a dataset, you can either remove it from the configuration file or comment all its lines with #
signs.
mode
changes the download strategy on a per-dataset basis, with three possible values:
'remote'
only downloads the dataset's file index. Theundr
Python package can be used to process the dataset files as if they were on your hard drive by streaming them from the server. This option is particularly useful for large datasets that do not fit on your disk but requires a fast internet connection since files are re-downloaded every time they are processed.'local'
downloads all the dataset files locally but does not decompress them (most datasets are stored as Brotli archives). Theundr
Python library transparently decompresses files in memory when you read them, making this option a good trade-off between disk usage and processing speed.'local-decompressed'
downloads all the dataset files locally and decompresses them. Decompressed files use a relatively inefficient plain binary file format so this option requires vast amounts of disk space (3 to 5 times as much as the Brotli archives). On the other hand, the plain binary format facilitates processing with other languages such as Matlab or C++.
undr
also supports hybrid configurations where only part of a dataset is downloaded or decompressed. You may also use local directories without a server. See [NOT DOCUMENTED YET] for details.
Download the datasets
python3 -m undr install
This command downloads the datasets file indexes. If the mode
is 'compressed'
or 'decompress'
, it also downloads the dataset files (and possibly decompresses them).
This command can be interrupted at any time with CTRL + C. Re-running it will resume download where it left off.
Generate a BibTex file
python3 -m undr bibtex --output datasets.bib
The UNDR project does not claim authorship of the datasets. Please use this file to cite the origiinal articles.
Python module
pip3 install undr
Python APIs
API name | Complexity | Configurability | Parallel processing | Progress display |
---|---|---|---|---|
loop | simple | high | no | no |
map | simple | low | yes | yes |
task | complex | high | yes | yes |
All three approches support progress persistence. Progress persistence sightly increases the code complexity but makes it possible to resume processing after a network or power failure.
Dataset format specification
-index.json
rationale
- prepend a special character to make sure the index file is ordered first in ASCII
- use an unreserved URL character to avoid escaping problems
- out of the unreserved URL characters (
-
/.
/_
/~
, see https://www.rfc-editor.org/rfc/rfc3986#section-2):.
would result in a hidden file on UNIX systems_
comes after alpha-numerical characters in ASCII~
is a shortcut for the user's home directory in many shells
Many command-line programs treat -index.json
as a flag, hence a command such as cat -index.json
returns an error. Prepending ./
to the filename avoids the problem: cat ./-index.json
.
Dataset mirrors
Example configuration
Apache
<VirtualHost *:80>
Alias / /path/to/local/directory/
<Directory "/path/to/local/directory/">
Require all granted
Options +Indexes
</Directory>
</VirtualHost>
To use another port, remember to edit /etc/apache2/ports.conf as well.
Nginx
server {
listen 80;
location / {
alias /path/to/local/directory/;
autoindex on;
sendfile on;
tcp_nopush on;
sendfile_max_chunk 1m;
}
}
Upload a dataset
-
python3 -m undr check-conformity /path/to/dataset
-
Caveat: An UNDR server can provide multiple compressed files (different formats) for each resource. The Python UNDR library always picks the best compression (smallest encoded size).
check-conformity
only checks the best compression and will not report errors for other compressions. -
for macOS users (.DS_Store)
Add to ~/.zshrc:
# rmdsstore removes .DS_Store files recursively
rmdsstore() {
if [ $# -eq 0 ]; then
printf 'usage: rmdsstore directory\n' >&2
return
fi
find "$1" -name ".DS_Store" -delete -print
}
Run rmdsstore /path/to/dataset
before running python3 -m undr check-conformity /path/to/dataset
.
Contribute
cd python
black . # format the source code (see https://github.com/psf/black)
pyright . #check types (see https://github.com/microsoft/pyright)
python3 -m pip install -e . # local installation
Publish the module
-
Bump the version number in setup.py.
-
Install twine
pip3 install twine
- Upload the source code to PyPI:
rm -rf dist
python3 setup.py sdist
python3 -m twine upload dist/*
Build the app
- Copy the UNDR library to the app build tree
python3 app/interface-prebuild.py
- Package the Python app using Cubuzoa
cd /path/to/cubuzoa
python3 cubuzoa.py build /path-to-undr/app/python --os linux --version '==3.8'
python3 cubuzoa.py build /path-to-undr/app/python --os 'macos|windows' --version '==3.9'
or build only for your platform
cd app/interface
mkdir local-build
cd local-build
pyinstaller --distpath ../build --add-data ../undr/-index_schema.json:undr --add-data ../undr/undr_default.toml:undr --add-data ../undr/undr_schema.json:undr -n interface-cp39-macosx -y ../interface.py
- Delete the UNDR library copy
rm -rf app/python/undr
- Build the Electron app
cd app
npm run release # or npm run watch for continuous development
Download with existing CLI
wget --no-parent --recursive --level=inf http://localhost:5432/dvs09/ find . -iname '*.br' | while read filename; do brotli -d -j "$filename"; done;
Dependencies
~16–29MB
~579K SLoC