28 releases (13 breaking)

0.14.0 Dec 18, 2024
0.13.0 Nov 2, 2024
0.12.0 Sep 19, 2024
0.11.0 Jun 28, 2024
0.3.5 Dec 23, 2023

#50 in Database implementations

Download history 10/week @ 2024-09-11 138/week @ 2024-09-18 21/week @ 2024-09-25 9/week @ 2024-10-02 7/week @ 2024-10-09 1/week @ 2024-10-16 133/week @ 2024-10-30 26/week @ 2024-11-06 2/week @ 2024-11-13 7/week @ 2024-11-20 1/week @ 2024-11-27 30/week @ 2024-12-04 56/week @ 2024-12-11 144/week @ 2024-12-18

230 downloads per month

MIT license

290KB
2K SLoC

dfsql

  • Revision: the standalone count command is replaced with len, so make sure to replace (count) and col "count" with len and col "len" respectively.
    • the unary count <col> command is unaffected.

Install

cargo install dfsql

How to run

dfsql --input your.csv --output a-new.csv
# ...or
dfsql -i your.csv -o a-new.csv

REPL

  • exit/quit: exit the REPL loop.
    exit
    
  • undo: undo the previous successful operation.
    undo
    
  • reset: reset all the changes and go back to the original data frame.
    reset
    
  • schema: show column names and types of the data frame.
    schema
    
  • save: save the current data frame to a file.
    save a-new.csv
    

Statements

  • select
    select <expr>*
    
    select last_name first_name
    
    • Select columns "last_name" and "first_name" and collect them into a data frame.
  • Group by
    group (<col> | <var>)* agg <expr>*
    
    group first_name agg (count)
    
    • Group the data frame by column "first_name" and then aggregate each group with the count of the members.
  • filter
    filter <expr>
    
    filter first_name = "John"
    
  • limit
    limit <int>
    
    limit 5
    
  • reverse
    reverse
    
  • sort
    sort ((asc | desc | ()) <col>)*
    
    sort icpsr_id
    
  • use
    use <var>
    
    use other
    
    • Switch to the data frame called other.
  • join
    (left | right | inner | full) join <var> on <col> <col>?
    
    left join other on id ID
    
    • left join the data frame called other on my column id and its column ID

Expressions

  • col: reference to a column.
    col : (<str> | <var>) -> <expr>
    
    select col first_name
    
  • exclude: remove columns from the data frame.
    exclude : <expr>* -> <expr>
    
    select exclude last_name first_name
    
  • literal: literal values like 42, "John", 1.0, and null.
  • binary operations
    select a * b
    
    • Calculate the product of columns "a" and "b" and collect the result.
  • unary operations
    select -a
    
    select sum a
    
    • Sum all values in column "a" and collect the scalar result.
  • alias: assign a name to a column.
    alias : (<col> | <var>) <expr> -> <expr>
    
    select alias product a * b
    
    • Assign the name "product" to the product and collect the new column.
  • conditional
    <conditional> : if <expr> then <expr> (if <expr> then <expr>)* otherwise <expr> -> <expr>
    
    select if class = 0 then "A" if class = 1 then "B" else null
    
  • cast: cast a column to either type str, int, or float.
    cast : <type> <expr> -> <expr>
    
    select cast str id
    
    • Cast the column "id" to type str and collect the result.

Dependencies

~26–59MB
~1M SLoC