#sqlite #datalog #query #csv #querying #modifying

bin+lib owoof

Uses SQLite to store data and a datalog-like format to query it

6 releases

0.2.0 Feb 17, 2022
0.2.0-pre.4 Dec 29, 2021
0.1.1 Sep 9, 2020

#726 in Database interfaces

Apache-2.0

155KB
3.5K SLoC

owoof

github crates.io docs.rs

A glorified query-builder inspired by Datomic that uses a datalog-like format for querying and modifying information around a SQLite database.

This is a pet project and probably shouldn't be used for anything serious.

This is implemented as a rust library. It is documented, you can read the source or maybe find the documentation published on docs.rs.

There are two rust executable targets. One provides a command-line-interface (as shown below) and another can be used for importing data from a csv file.

CLI

Compile this with cargo build using --features cli --bin cli.

The CLI can be used to initialize new database files, assert/create, retract/remove, or query information.

Here are some examples:

$ echo '[{":db/attribute": ":pet/name"},
         {":pet/name": "Garfield"},
         {":pet/name": "Odie"},
         {":pet/name": "Spot"},
         {":db/attribute": ":person/name"},
         {":db/attribute": ":person/starship"},
         {":person/name": "Jon Arbuckle"},
         {":person/name": "Lieutenant Commander Data",
          ":person/starship": "USS Enterprise (NCC-1701-D)"}]' \
      | owoof assert
[
  "#45e9d8e9-51ea-47e6-8172-fc8179f8fbb7",
  "#4aa95e29-8d45-470b-98a7-ee39aae1b9c9",
  "#2450b9e6-71a4-4311-b93e-3920eebb2c06",
  "#c544251c-a279-4809-b9b6-7d3cd68d2f2c",
  "#19a4cba1-6fc7-4904-ad36-e8502445412f",
  "#f1bf032d-b036-4633-b6f1-78664e44603c",
  "#e7ecd66e-222f-44bc-9932-c778aa26d6ea",
  "#af32cfdb-b0f1-4bbc-830f-1eb83e4380a3"
]

$ echo '[{":db/attribute": ":pet/owner"},
         {":db/id": "#4aa95e29-8d45-470b-98a7-ee39aae1b9c9",
          ":pet/owner": "#e7ecd66e-222f-44bc-9932-c778aa26d6ea"},
         {":db/id": "#2450b9e6-71a4-4311-b93e-3920eebb2c06",
          ":pet/owner": "#e7ecd66e-222f-44bc-9932-c778aa26d6ea"},
         {":db/id": "#c544251c-a279-4809-b9b6-7d3cd68d2f2c",
          ":pet/owner": "#af32cfdb-b0f1-4bbc-830f-1eb83e4380a3"}]' \
      | owoof assert
[
  "#ffc46ae2-1bde-4c08-bfea-09db8241aa2b",
  "#4aa95e29-8d45-470b-98a7-ee39aae1b9c9",
  "#2450b9e6-71a4-4311-b93e-3920eebb2c06",
  "#c544251c-a279-4809-b9b6-7d3cd68d2f2c"
]

$ owoof  '?pet :pet/owner ?owner' \
  --show '?pet :pet/name' \
  --show '?owner :person/name'
[
  [
    { ":pet/name": "Garfield" },
    { ":person/name": "Jon Arbuckle" }
  ],
  [
    { ":pet/name": "Odie" },
    { ":person/name": "Jon Arbuckle" }
  ],
  [
    { ":pet/name": "Spot" },
    { ":person/name": "Lieutenant Commander Data" }
  ]
]

$ owoof '?person :person/starship "USS Enterprise (NCC-1701-D)"' \
        '?pet :pet/owner ?person' \
        '?pet :pet/name ?n'
[
  "Spot"
]

# Or, suppose you know someone's name and their pet's name but don't know the attribute
# that relates them...  (But also this doesn't use indexes well so don't do it.)

$ owoof '?person :person/name "Lieutenant Commander Data"' \
        '?pet ?owner ?person' \
        '?pet :pet/name "Spot"' \
 --show '?owner :db/attribute'
[
  { ":db/attribute": ":pet/owner" }
]

Imported from the goodbooks-10k dataset.

$ owoof '?r :rating/score 1' \
        '?r :rating/book ?b' \
        '?b :book/authors "Dan Brown"' \
 --show '?r :rating/user' \
 --show '?b :book/title' \
 --limit 5
[
  [
    { ":rating/user": 9 },
    { ":book/title": "Angels & Demons  (Robert Langdon, #1)" }
  ],
  [
    { ":rating/user": 58 },
    { ":book/title": "The Da Vinci Code (Robert Langdon, #2)" }
  ],
  [
    { ":rating/user": 65 },
    { ":book/title": "The Da Vinci Code (Robert Langdon, #2)" }
  ],
  [
    { ":rating/user": 80 },
    { ":book/title": "The Da Vinci Code (Robert Langdon, #2)" }
  ],
  [
    { ":rating/user": 89 },
    { ":book/title": "The Da Vinci Code (Robert Langdon, #2)" }
  ]
]

Importing goodbooks-10k

  1. Initialize an empty database.

    $ owoof init
    
  2. Import books & --output a copy of the data with the :db/id column for each imported row.

    $ owoof-csv --output -- \
          :book/title \
          :book/authors \
          :book/isbn \
          :book/avg-rating\ average_rating \
          < goodbooks-10k/books.csv \
          > /tmp/imported-books
    
  3. Import ratings, we're using mlr to join the ratings with the imported books.

    $ mlr --csv join \
          -f /tmp/imported-books \
          -j book_id \
          < goodbooks-10k/ratings.csv \
      | owoof-csv -- \
          ':rating/book :db/id' \
          ':rating/score rating' \
          ':rating/user user_id'
    
  4. That takes some time (probably minutes) but then you can do something like.

    $ owoof '?calvin :book/title "The Complete Calvin and Hobbes"' \
            '?rating :rating/book ?calvin' \
            '?rating :rating/score 1' \
            '?rating :rating/user ?u' \
            '?more-great-takes :rating/user ?u' \
            '?more-great-takes :rating/book ?b' \
            '?more-great-takes :rating/score 5' \
     --show '?b :book/title :book/avg-rating' \
     --asc  '?b :book/avg-rating'
    

    And it should spit out some answers.

TODO/Caveats

  • Testing is not extensive at this point.

    The schema should be enforced, so no deleting attributes that are in use, but I haven't done the work to verify this so there might be some surprises.

  • Performance is not super reliable.

    Version 0.2 adds partial indexes over specific attributes and has helped a lot with search performance. However, there is no index on values. Some queries are impacted by this more than others, so performance is not reliable.

    The difficulty currently with a values index is that SQLite's query planner will prefer it in cases where it shouldn't. It isn't a good index and should be a last-resort -- it's also huge.

  • This is not feature-rich yet, constraints ensure equality and no support for constraints over ranges or involving logical operations exist yet and honestly I haven't tested how well it will perform with the schema changes made in 0.2.

Internal TODOs

  • Create DontWoof off the Connection.

  • The Select borrowing Network is a bit weird. I tried to split it off but it was still weird. Not sure what to do about that. One consideration is that pushing a Select on to a Query only borrows from the network. Maybe this could be relaxed?

  • Test reference counting? Add a clean-up that removes soups with zero rc and runs pragma optimize.

  • Maybe add some sort of update thing to shorthand retract & assert?

  • The :db/id attribute is kind of silly since the entity and value are the same for triplets of that attribute.

    It's useful for object forms / mappings; like {":db/id": ...}. But maybe there is a more clever way to group by something? (Like some sort of primary key associated with every form that the database stores ... 🤔)

See Also

My blog post associated with version 0.1 this software: https://froghat.ca/blag/dont-woof

License

This is licensed under Apache License, Version 2.0.

Dependencies

~23MB
~435K SLoC