1 unstable release

0.1.0 Apr 21, 2022

#765 in Science

MIT license

200KB
4K SLoC

About

日本語

crates.io

API Document

This is a crate (Schema-Based Random Data GENerator, i.e. SBRD GEN) that can generate random dummy data based on a schema. It is available both as a library and as a CLI tool.

See About Schema for schema and schema generators, and List of generators that can be specified for generators and their builders.

This program uses serde to parse the schema and format the generated results.

When used as a library

If you want to use it as a library, there are two ways: How to generate with a single generator and How to combine multiple generators with a schema.

How to generate with a single generator

A single generator method can be used when the generated results are not so good that they need to be combined. Of course, it can also be generated by How to combine multiple generators with a schema.

The usage is as follows

  1. Prepare a builder with new_xx (where xx is variable) in GeneratorBuilder; If you want to be able to generate nulls, add the nullable specification.
  2. Convert the builder into a generator by build.
  3. Generate dummy data by passing the seed species and context to the generator.

The following is an example of an actual description.

use rand::thread_rng;
use sbrd_gen::builder::GeneratorBuilder;
use sbrd_gen::value::DataValueMap;

fn main() {
    let builder = GeneratorBuilder::new_int(Some((0..=100).into())).nullable();
    let generator = builder.build().unwrap();
    let generated_value = generator.generate(&mut thread_rng(), &DataValueMap::new()).unwrap();
    
    println!("generated: {}", generated_value);
}

How to combine multiple generators with a schema

If you want to use multiple generators, you can use this method.

The procedure is as follows

  1. Prepare a list of ParentGeneratorBuilder as a list of generators you want to use. Note that this list is used for generation from the top to the bottom, so if you declare them in the wrong order, Script and Format, which can replace keys with generated values, will not function properly.
  2. Prepare a list of keys to be output out of the generators you wish to use.
  3. Construct SchemaBuilder with the list of keys you want to output and the list of generators you want to use as arguments.
  4. Build the SchemaBuilder and convert it to Schema.
  5. Generate generate with the converted Schema to generate dummy data, or write it to the Writer with write_xx (where xx is variable) in the GeneratedValueWriter trace.

See all_builder.rs for an actual writing example.

When used as a CLI tool

When used as a CLI tool, dummy data can be generated by specifying the file path of the schema file. The CLI allows you to specify the file format of the schema file, the number of files to output, and the format of the output. For details, please see the CLI help.

How to install

There are several ways to install, the most common being Install using Cargo and Install from GitHub release page.

Install using Cargo

With the cargo command available, hit the following command. If you get a help message with sbrd-gen --help, the installation was successful.

cargo install sbrd-gen
sbrd-gen --help

Install from GitHub release page

To install from the GitHub release page, download the desired version from here. After extracting the downloaded folder, make it available through the binary file path. If you get a help message with sbrd-gen --help, the installation was successful.

How to use CLI tool

Run the command with the syntax sbrd-gen [OPTIONS] <SCHEMA_FILE_PATH> after passing executable file (e.x. sbrd-gen.exe in Windows). The following describes the arguments and options that can be specified, but can also be viewed in the help message displayed by sbrd-gen --help.

Arguments

  • <SCHEMA_FILE_PATH> : File path of the file containing the schema to be used for generation.

Options

  • Parser
    • Specific1 : --parser <PARSER_TYPE>
    • Specific2 : -p <PARSER_TYPE>
    • Description : This option specifies the type of parser to be used. Specify the type of parser you want to use for <PARSER_TYPE>.
    • Available options : yaml, json
    • Default : yaml
  • Output type
    • Specific1 : --type <OUTPUT_TYPE>
    • Specific2 : -t <OUTPUT_TYPE>
    • Description : Option to specify the format you want to output. Specify the formatter you want to use for <OUTPUT_TYPE>.
    • Available options : yaml, json, csv, tsv
    • Default : json
  • Number of outputs
    • Specific1 : --num <COUNT>
    • Specific2 : -n <COUNT>
    • Description : Option to specify the number of dummy data sets specified by keys in the schema. Specify the number in <COUNT>.
    • Default : 10
  • Flag indicating that the key header should not be output.
    • Specific : --no-header
    • Description : Option to specify if you do not want to include the key in the output result.
  • Execute schema parsing only
    • Specific : --dry-run
    • Description : Option to specify that only schema parsing is performed without outputting dummy data and then exit.
  • Help
    • Specific1 : --help
    • Specific2 : -h
    • Description : Option to specify when you want to check help.
  • Version
    • Specific1 : --version
    • Specific2 : -V
    • Description : Option to specify when you want to check the version.

About Schema

The schema is specified by a Map(KVS) consisting of a sequence of Key to be output with keys as key and a sequence of Generator Builders with generators as key. The formats supported are Yaml and Json.

For example descriptions, see all.yaml and all.json.

Value Context

When generating dummy data from the schema, the generators specified in the schema are executed from the top. The generated values are stored in a Map (KVS) data structure called a Value Context. In other words, the pairs that can be referenced in the Value Context are the key/value pairs of the generators that were successfully generated at the time of reference. This Value Context can be used to retrieve the value associated with a key from the key to be output, or to convert the notation "{key}" (no space between brackets and key) specified as Script or Format to the current The value of the key in the context is replaced by the value associated with the key in the context, and then evaluated, etc.

List of options for parent generator

The parent generator is specified by a Map(KVS) consisting of keys and builder options. The structure is ParentGeneratorBuilder.

Key

A key to identify the generator, specified as a string with key as the key.

It is also used as a substitution key when evaluating Script or Format.

Builder

You can specify the generator options listed in List of generators that can be specified. The generator to be generated is determined by Type, and other options are interpreted in the same way.

List of generators that can be specified

Generators that can be specified as a schema or a single generator are as follows.

String constructor (build_string module)

This module consists of a collection of generators that assemble strings based on the results generated by other generators.

Distribution system (distribution module)

This module consists of a collection of generators that generate random numbers based on a distribution function.

  • normal generator
    • Description : Generator that generates random numbers according to a normal distribution.
    • Remarks : Parameters can be the mean of Real-number (mean) and the standard deviation of Real-number (std_dev). Default is 0.0 and 1.0, respectively.
    • Struct : NormalGenerator
    • Type : dist-normal
    • Required options : Type, Parameters
    • Available options : Type, Nullable, Parameters
    • Generate value type : Real-number

Evaluation system (eval module)

This module consists of a collection of generators that evaluate a given expression and output a value.

  • eval generator
    • Description : Generator that outputs the result of evaluating the specified Script.
    • Remarks : None
    • Struct : EvalGenerator
    • Type : eval-int(Integer), eval-real(Real-number), eval-bool(Boolean), eval-string(String)
    • Required options : Type, Script
    • Available options : Type, Nullable, Script
    • Generate value type : Integer(eval-int), Real-number(eval-real), Boolean(eval-bool), String(eval-string)

Sequential change system (incremental module)

This module consists of a collection of generators that change sequentially, such as increasing by a certain amount each time they are executed.

  • increment id generator
    • Description : Generator that adds the number of steps of the specified Increment before each generation. The initial value is the initial value of the specified Increment.
    • Remarks : Default for Increment is 1 increase beginning 1.
    • Struct : IncrementIdGenerator
    • Type : increment-id
    • Required options : Type
    • Available options : Type, Nullable, Increment (Integer)
    • Generate value type : Integer

Primitive (primitive Module)

This module consists of a collection of generators that generate basic values.

  • int generator
    • Description : Generator that generates Integer with the specified Range, where the Default range is between the minimum value of i16 (-32768) and the maximum value of i16 (32767).
    • Remarks : None
    • Struct : IntGenerator
    • Type : int
    • Required options : Type
    • Available options : Type, Nullable, Range (Integer)
    • Generate value type : Integer
  • real generator
    • Description : Generator that generates a Real-number in the specified Range, where the Default range is between the minimum value of i16 (-32768) and the maximum value of i16 (32767).
    • Remarks : The larger the absolute value of the generated value, the fewer the number of characters after the decimal point, and the smaller the absolute value, the more the number of characters after the decimal point.
    • Struct : RealGenerator
    • Type : real
    • Required options : Type
    • Available options : Type, Nullable, Range (Real-number)
    • Generate value type : Real-number
  • bool generator
    • Description : Generator that generates true or false with 50% probability.
    • Remarks : None
    • Struct : BoolGenerator
    • Type : bool
    • Required options : Type
    • Available options : Type, Nullable
    • Generate value type : Boolean
  • date time generator
    • Description : This generator generates date and time in the format specified by Format.
    • Remarks : The format of date and time specified by Range is "%Y-%m-%d %H:%M:%S". Default value format of Format has the same format. See here for the format. Default value of Range is from 1900-01-01 00:00:00 less than 2151-01-01 00:00:00. An unspecified boundary is assumed to have a Default value.
    • Struct : DateTimeGenerator
    • Type : date-time
    • Required options : Type
    • Available options : Type, Nullable, Range (DateTime-String), Format
    • Generate value type : String
  • date generator
    • Description : This generator generates date in the format specified by Format.
    • Remarks : The format of date specified by Range is "%Y-%m-%d". Default value format of Format has the same format. See here for the format. Default value of Range is from 1900-01-01 less than 2151-01-01. An unspecified boundary is assumed to have a Default value.
    • Struct : DateGenerator
    • Type : date
    • Required options : Type
    • Available options : Type, Nullable, Range (Date-String), Format
    • Generate value type : String
  • time generator
    • Description : This generator generates time in the format specified by Format.
    • Remarks : The format of time specified by Range is "%H:%M:%S". Default value format of Format has the same format. See here for the format. Default value of Range is from 00:00:00 less than and equal 23:59:59. An unspecified boundary is assumed to have a Default value.
    • Struct : TimeGenerator
    • Type : time
    • Required options : Type
    • Available options : Type, Nullable, Range (Time-String), Format
    • Generate value type : String
  • always null generator
    • Description : Generator that always generates null.
    • Remarks : None
    • Struct : AlwaysNullGenerator
    • Type : always-null
    • Required options : Type
    • Available options : Type, Nullable
    • Generate value type : Null

Child Generator Random Selection System (random_children module)

List of child generators is a generator that generates values.

  • case when generator
    • Description : Generator generated by evaluating Condition in the order of declaration and using child generators that are true.
    • Remarks : A child generator for the Default condition (i.e., Condition is not specified) is needed in case Condition is not caught.
    • Struct : CaseWhenGenerator
    • Type : case-when
    • Required options : Type, List of child generators with Condition specified
    • Available options : Type, Nullable, List of child generators with Condition specified
    • Generate value type : Generate value type of the child generator used for generation
  • random child generator

Value alternative system (random_values module)

This module consists of a collection of generators that generate values using Character list, List of Values, and External file path.

List of generator options

The following options can be specified to build the generator. The available options vary from generator to generator, but all other options are ignored.

Type

  • Description : Type of generators listed in List of generators that can be specified, used to identify the type of generator.
  • Remarks : None
  • Struct : GeneratorType
  • Key name : type
  • Value type : String

Nullable

  • Description : A flag indicating whether null can be generated in addition to the value generated by the generator; if true, null can be generated; Default is false.
  • Remarks : None
  • Struct : bool
  • Key name : nulable
  • Value type : Boolean

Format

  • Description : This format is used for key/value pairs in Value Context (let's say the pair is (key, value)). is in turn evaluated as a String after replacing the string "{key}" or "{key:<format-option>}" (no space between the parentheses, key and colon) in the format with the formatted value. The characters '{' and '}' can be escaped by stacking two characters on top of each other, but note that '{' and '}' are parsed preferentially as key specifications, so using '{' and '}' as keys will not identify the key well. Note that you can use the '{' and '}' as output keys without any problem.
  • Remarks : Please refer to the API documentation of the Evaluator in this sbrd-gen crate for more information on syntax.
  • Struct : String
  • Key name : format
  • Value type : String

Script

  • Description : This script is a key/value pair (let's say the pair is (key, value)) in Value Context. is in turn replaced by the string "{key}" or "{key:<format-option>}" (no space between the parentheses, key and colon) in the script with the formatted value before being evaluated as an expression. The characters '{' and '}' can be escaped by stacking two characters on top of each other, but note that '{' and '}' are parsed preferentially as key specifications, so using '{' and '}' as keys will not identify the key well. Note that you can use the '{' and '}' as output keys without any problem.
  • Remarks : Please refer to the API documentation of the Evaluator in this sbrd-gen crate for more information on syntax and expressions.
  • Struct : String
  • Key name : script
  • Value type : String

Separator

  • Description : A string used for delimitation in string construction, etc.
  • Remarks : None
  • Struct : String
  • Key name : separator
  • Value type : String

Range

  • Description : This option is used to specify the range of the number of iterations and the range of values to be generated.
  • Remarks : The six available value types for ranges are Integer, Real-number, String, DateTime-String, Date-String, and Time-String. Refer to the respective Primitive generators for specifying date/time-related values.
  • Struct : ValueBound
  • Key name : range
  • Value type : Map (KVS) consisting of the key start with the value of value type, the key end, and the key include_end with the flag indicating that the value of end is included, each of which is optional. The default value of include_end is true.

Increment

  • Description : Option to specify the initial value and the amount of change in the value that will be updated each time the generator is run.
  • Remarks : The six available value types are Integer, Real-number, String, DateTime-String, Date-String, and Time-String. Specifying a value is the same as specifying Range.
  • Struct : ValueStep
  • Key name : increment
  • Value type : Map(KVS) consisting of a key initial with a value of value type as an initial value and a key step with a value of value type representing the amount of change, where initial is required and step is optional.

List of child generators

  • Description : This option specifies the sequence of generators specified in List of generator options. The generator specified here is called a child generator, and unlike the parent generator, an additional List of options for child generator can be specified.
  • Remarks : None
  • Struct : Vec<ChildGeneratorBuilder>>
  • Key name : children
  • Value type : Sequence of child generators

Character list

  • Description : Option to enumerate characters for random selection.
  • Remarks : None
  • Struct : String
  • Key name : chars
  • Value type : String

List of Values

  • Description : Option to enumerate values for random selection.
  • Remarks : Available value types are Integer, Real-number, and String.
  • Struct : Vec<DataValue>
  • Key name : values
  • Value type : Sequence consisting of Integer, Real-number, or String type

External file path

  • Description : This option specifies the file path of a file that enumerates the values to be selected for random selection as a single line == one value. In addition to an absolute path, it can be specified relative to the schema file.
  • Remarks : None
  • Struct : PathBuf
  • Key name : filepath
  • Value type : String

Parameters

  • Description : This option is used to specify the parameters needed to construct the distribution function. See each generator in Distribution system for the keys and values to specify.
  • Remarks : None
  • Struct : DataValueMap<String>
  • Key name : parameters
  • Value type : Map(KVS)

List of options for child generator

The child generator can specify the options listed below in addition to the options that can be specified by the generator.

Condition

  • Description : Option to specify the conditional branching condition for which child generator to use. If specified, it is evaluated like Script to determine true/false, always returning true if not specified.
  • Remarks : None
  • Struct : String
  • Key name : condition
  • Value type : String

Weight

  • Description : Option to specify the weight for random selection of child generators. The higher the weight, the more often it is selected; Default weight is 1.
  • Remarks : None
  • Struct : Weight
  • Key name : weight
  • Value type : Integer(Not negative)

LICENSE

MIT

Dependencies

~12–24MB
~335K SLoC