3 releases (stable)
new 1.0.1 | May 15, 2024 |
---|---|
1.0.0 | May 14, 2024 |
0.4.0 | Aug 15, 2021 |
#116 in Cryptography
143 downloads per month
97KB
1.5K
SLoC
giant-squid
An alternative MWA ASVO client. For general help on using the MWA ASVO, please visit: MWA ASVO wiki.
NOTE FOR HPC USERS
Please read this wiki article if you are running giant-squid on HPC systems.
giant-squid
was originally created as a library to do MWA ASVO related tasks
in the Haskell programming language (now available in Rust). However, it's not
just a library; the giant-squid
executable acts as an alternative to the
manta-ray-client and may better
suit users for a few reasons:
-
By default,
giant-squid
stream untars the downloads from ASVO. In other words, rather than downloading a potentially large (> 100 GiB!) tar file and then untarring it yourself (thereby occupying double the space of the original tar and performing a very expensive IO operation), it is possible to get the files without performing an untar using--keep-zip
-
giant-squid
does not require a CSV file to submit jobs; this is instead handled by command line arguments. -
For any commands that accept obsids or job IDs, it is possible use text files instead. These files are unpacked as if you had typed them out manually, and each entry of the text file(s) are checked for validity (all ints and all 10-digits long); any exceptions are reported and the command fails.
-
One can ask
giant-squid
to print their ASVO queue as JSON; this makes parsing the state of your jobs in another programming language much simpler. -
By default,
giant-squid
will validate the hash of the archive. You can skip this check with--skip-hash
Usage
Print help text
giant-squid -h
This also applies to all of the subcommands, e.g.
giant-squid download -h
Print the giant-squid
version
giant-squid --version
giant-squid -V
(Useful if things are changing over time!)
List ASVO jobs
giant-squid list
giant-squid l
List ASVO jobs in JSON
the following commands are equivalent:
giant-squid list --json
giant-squid list -j
giant-squid l -j
Example output:
giant-squid list -j
{"325430":{"obsid":1090528304,"jobId":325430,"jobType":"DownloadVisibilities","jobState":"Ready","files":[{"fileName":"1090528304_vis.zip","fileSize":10762878689,"fileHash":"ca0e89e56cbeb05816dad853f5bab0b4075097da"}]},"325431":{"obsid":1090528432,"jobId":325431,"jobType":"DownloadVisibilities","jobState":"Ready","files":[{"fileName":"1090528432_vis.zip","fileSize":10762875021,"fileHash":"9d9c3c0f56a2bb4e851aa63cdfb79095b29c66c9"}]}}
jobType
is allowed to be any of:
Conversion
DownloadVisibilities
DownloadMetadata
DownloadVoltage
CancelJob
jobState
is allowed to be any of:
Queued
Processing
Ready
Error: Text
(e.g. "Error: some error message")Expired
Cancelled
Example reading this in Python:
$ giant-squid list -j > /tmp/asvo.json
$ ipython
Python 3.8.0 (default, Oct 23 2019, 18:51:26)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.10.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import json
In [2]: with open("/tmp/asvo.json", "r") as h:
...: q = json.load(h)
...:
In [3]: q.keys()
Out[3]: dict_keys(['216087', '216241', '217628'])
Filter ASVO job listing
giant-squid list
takes an optional list of identifiers that can be used to filter the job listing,
these identifiers can either be a list of jobIDs or a list of obsIDs, but not both.
Additionally, the --states
and --types
options can be used to further filter the output.
These both taks a comma-separated, case-insensitive list of values from the jobType
and
jobState
lists above. These can be provided in TitleCase
, UPPERCASE
, lowercase
,
kebab-case
, snake_case
, or even SPoNgeBOb-CAse
example: show only jobs that match both of the following conditions:
- obsid is
1234567890
or1234567891
- jobType is
DownloadVisibilities
,DownloadMetadata
orCancelJob
- jobState is
Processing
orQueued
giant-squid list \
--types dOwNlOaD__vIsIbIlItIeS,download-metadata,CANCELJOB \
--states pRoCeSsInG,__Q_u_e_u_e_D__ \
1234567890 1234567891
Example: manual hash validation with Bash and jq
This example demonstrates how it is possible to stream the output of giant-squid list -j
into
jq
. This is the equivalent of what giant-squid download
does,
but with the extra overhead of storing the tar to disk (-k
).
set -eux
giant-squid list -j --types download_visibilities --states ready \
| jq -r '.[]|[.jobId,.files[0].fileUrl//"",.files[0].fileSize//"",.files[0].fileHash//""]|@tsv' \
| tee ready.tsv
while read -r jobid url size hash; do
# note: it's a good idea to check you have enough disk space here using $size.
wget $url -O ${jobid}.tar --progress=dot:giga --wait=60 --random-wait
sha1=$(sha1sum ${jobid}.tar | cut -d' ' -f1)
if [ "\$sha1" != "\$hash" ]; then
echo "Download failed, hash mismatch. Expected $hash, got $sha1"
exit 1
fi
tar -xf ${jobid}.tar
do < ready.tsv
Download ASVO jobs
To download job ID 12345 to your current directory '.':
giant-squid download 12345
# or
giant-squid d 12345
To download obsid 1065880128 to your current directory '.':
giant-squid download 1065880128
# or
giant-squid d 1065880128
(giant-squid
differentiates between job IDs and obsids by the length of the
number specified; 10-digit numbers are treated as obsids. If the ASVO ever
serves up more than a billion jobs, you have permission to be upset with me. The
same applies if this code is still being used in the year 2296.)
Text files containing job IDs or obsids may be used too.
You can specify the directory to download to by providing the download_dir
parameter
to the download
subcommand. Ommitting this will default to your current dir .
.
To download obsid 1065880128 to your /tmp
directory:
giant-squid download --download-dir /tmp 1065880128
# or
giant-squid d -d /tmp 1065880128
By default, giant-squid
will perform stream unzipping. Disable this with -k
(or --keep-zip
).
The MWA ASVO provides a SHA-1 of its downloads. giant-squid
will verify the integrity
of your download by default. Give a --skip-hash
to the download
command to skip.
Jobs which were submitted with the /astro or /scratch data delivery option behave differently than jobs submitted with the acacia data delivery option. When attempting to download an /astro job or /scratch job, if the path of the job (eg /astro/mwaops/asvo/12345) is reachable from the current host, it will be moved to the current working directory. Otherwise, it will be skipped.
Submit ASVO jobs
Visibility downloads
A "visibility download job" refers to a job which provides a zip containing gpubox files, a metafits file and cotter flags for a single obsid.
To submit a visibility download job for the obsid 1065880128:
giant-squid submit-vis 1065880128
# or
giant-squid sv 1065880128
Text files containing obsids may be used too.
If you want to check that your command works without actually submitting the
obsids, then you can use the --dry-run
option (short version -n
).
You can choose whether to have your files tarred up and uploaded to Pawsey's Acacia (default),
or you can request that the files be left on Pawsey's /astro or /scratch filesystem. The second option requires
that your Pawsey group be set in your ASVO account, please contact an admin to request this. To submit
a job with the /astro or /scratch delivery option, set the environment variable GIANT_SQUID_DELIVERY=astro|scratch.
Alternatively specify -d astro|scratch|acacia
.
Conversion downloads
To submit a conversion job for obsid 1065880128:
giant-squid submit-conv 1065880128
# or
giant-squid sc 1065880128
Text files containing obsids may be used too.
The default conversion options can be found by running the help text:
giant-squid submit-conv -h
To change the default conversion options and/or specify more options, specify comma-separated key-value pairs like so:
giant-squid submit-conv 1065880128 -p avg_time_res=0.5,avg_freq_res=10
If you want to check that your command works without actually submitting the
obsids, then you can use the --dry-run
option (short version -n
). More
messages (including what giant-squid
uses for the conversion options) can be
accessed with -v
(or --verbose
). e.g.
$ giant-squid submit-conv 1065880128 -nv -p avg_time_res=0.5,avg_freq_res=10
20:40:24 [INFO] Would have submitted 1 obsids for conversion, using these parameters:
{"output": "uvfits", "job_type": "conversion", "flag_edge_width": "160", "avg_freq_res": "10", "avg_time_res": "0.5"}
You can choose whether to have your files tarred up and uploaded to Pawsey's Acacia (default), or you can request that the files be left on Pawsey's /astro filesystem. The second option requires that your Pawsey group be set in your ASVO account, please contact an admin to request this. To submit a job with the /astro option, set the environment variable GIANT_SQUID_DELIVERY=astro.
Metadata downloads
A "metadata download job" refers to a job which provides a zip containing a metafits file and cotter flags for a single obsid.
To submit a visibility download job for the obsid 1065880128:
giant-squid submit-meta 1065880128
# or
giant-squid sm 1065880128
Text files containing obsids may be used too.
If you want to check that your command works without actually submitting the
obsids, then you can use the --dry-run
option (short version -n
).
You can choose whether to have your files tarred up and uploaded to Pawsey's Acacia (default), or you can request that the files be left on Pawsey's /astro filesystem. The second option requires that your Pawsey group be set in your ASVO account, please contact an admin to request this. To submit a job with the /astro option, set the environment variable GIANT_SQUID_DELIVERY=astro.
Voltage downloads
A "voltage download job" refers to a job which provides the raw voltages for one or more obsids.
To submit a voltage download job for the obsid 1065880128:
giant-squid submit-volt --delivery astro --offset 0 --duration 8 1065880128
# or
giant-squid sv -d astro -o 0 -u 8 1065880128
Text files containing obsids may be used too.
If you want to check that your command works without actually submitting the
obsids, then you can use the --dry-run
option (short version -n
).
Unlike other jobs, you cannot choose to have your files tarred up and uploaded to Pawsey's Acacia for remote
download, as the data is generally too large. If you are in the mwaops
or mwavcs
Pawsey groups and you have asked an MWA ASVO admin to
set the pawsey group in your MWA ASVO profile, you can request that the files be left on Pawsey's /astro filesystem. To submit
a job with the /astro option, set the environment variable GIANT_SQUID_DELIVERY=astro or pass -d astro
.
Resubmitting jobs
By default, the MWA ASVO server will not allow you to submit a new job which is has the exact same settings/parameters as an existing job in your queue (except errored jobs). You can, however override this behaviour by specifying --allow-resubmit
(short version -r
) on any job submission.
Download performance
By default, when downloading, giant-squid
will store 100 MiB of the download
in memory before writing to disk. This is friendlier on disks (especially those
belonging to supercomputers!), and can make downloads faster.
The amount of data to cache before writing can be tuned by setting
GIANT_SQUID_BUF_SIZE
. e.g.
export GIANT_SQUID_BUF_SIZE=50
giant-squid download 12345
would use 50 MiB of memory to cache the download before writing.
Installation
Pre-compiled
Have a look at the GitHub releases page.
Building from crates.io
-
Install Rust
-
Run
cargo install mwa_giant_squid
-
The final executable will be at
~./cargo/bin/giant-squid
-
This destination can be configured with the
CARGO_HOME
environment variable.
-
Building from source
-
Install Rust
-
Clone this repo and
cd
into itgit clone https://github.com/MWATelescope/giant-squid && cd giant-squid
-
Run
cargo install --path .
-
The final executable will be at
~./cargo/bin/giant-squid
-
This destination can be configured with the
CARGO_HOME
environment variable.
-
Docker
You can run giant-squid using docker
docker run mwatelescope/giant-squid:latest -h
Other
The Haskell code is still available on chj's GitLab. Switching to Rust means that the code is more efficient and the code is easier to read (sorry Haskell. I love you, but you're weird).
Dependencies
~8–23MB
~349K SLoC