#indentation #tokenizer #parser #text-input

indentation_flattener

From indented input, generate plain output with indentation PUSH and POP codes

1 unstable release

Uses old Rust 2015

0.1.0 Apr 22, 2017

#1737 in Text processing

GPL-3.0 license

21KB
575 lines

Indentation flattener

A small an simple library to transform text with indentations to a text with extra symbols "PushIndent" and "PopIndent"

This project is based on indent_tokenizer

This format is more convenient to use with peg grammars (as I will do)

Usage

Add to cargo.toml [source, toml]

[dependencies]
pending...

See example below

Modifs

None

Indentation format

Tabs are no valid on indentation grouping.

Indentation "Push" and "Pop" will be inserted on each indentation level modif.

Let's see by example.

.Simple valid input

.....
...
    ....        new ident level
        ....    new ident level
        ....
    ....        one back indent level
    ....
        ....    new indent level
....            two back indent level
....
    ....        new indent level

Indentation groups can have any number of spaces

.Valid indentation different spaces

.....
  ....              level1  <--
        ....        level2
  ....              level1  <--
  ....
....
....
      ....          level1  <--
            ....    level2  <--

It's not a good idea to have same level with different spaces, but it's allowed when you are creating a new level.

In this example, last level1 and last level2 are idented with more spaces than previus ones

.Invalid indentation

.....
...
    ....
        ....
       ....     <--  incorrect indentation
    ....        <--  correct previous ident level
    ....
....
....
    ....

In order to go back a level, the indentation has to match with the previous on this level.

As we saw in previous example, increasing level is free indentation.

.Start line indicator

|.....
    |.....
    |......
        |......

You can start lines with |, but it's optional.

.Start line indicator is optional

.....
     |1234
     1234
     ......
        ......

But if you combine lines with | and without on same level, columns will not match

It is usefull when you need to start with spaces.

.I want to start a line with spaces

.....
     | .....     <- This line starts with an space
     |  ......   <- Starting with 2 spaces
     |.....      <- starts with no spaces
     .....      <- starting with no spaces
     ...        <- starting with no spaces

.My line starts with a |

.....
    ||.....     This line starts with a `|`
    |......     This one starts with `.`

.....
.....
    ||.....     This line starts with a `|`
    ......     This one starts with `.`

A line is empty when there are no content or it only has spaces (no matter how many).

Empty lines will produce new line char and no change of indentation level

.Empty lines

.....
    .....
    .....
    .....
    .....     next line is empty

    .....     next line is empty

.....
.....         next line is empty

What if I want represent empty lines?

.Representing empty lines

.....
    .....
    .....     There is a new line after (same indent level)

    .....
    .....     There is a new line after (explicitly marked)
    |
    .....     three new lines after
    |
    |
    |
.....   Two new lines at end of document
|
|

| is quite usefull if you need to represent empty lines at end of document.

What if I want to represent spaces at end of line?

Spaces at end of line will not be erased, therefore, you don't need to do anything about it.

But it could be intesting to represent it because some editors can run trailing or just because you can visualize it.

.Representing spaces at end line

.....
    .....
    .....
    .....
    This line keeps 2 spaces and end  |
    and you know it

    Next line is properly indented and only has spaces
    |   |

In fact, you can write | at end of all lines. It will be removed.

Next strings, are equivalent.

.| it's optional at end of line

.....|
    .....|
    .....|
    .....|


.....
    .....
    .....
    .....

But I could need a pipe | at end of line

.pipe at end of line

.....
    .....
    .....
    .....
    This line ends with a pipe||

Output format

The output will be a string with codes PUSH_INDENT and POP_INDENT

.From lib.rs [source, rust]

    const PUSH_INDENT: char = 0x02 as char;
    const POP_INDENT: char = 0x03 as char;

Spaces to mark indentation, will be removed from output.

See examples below.

As the system works with lines, every existing line with content, will finish with end of line

API

It works with concrete types vs general types (as String, u32 or usize)

Constants:: [source, rust]

const EOL: char = '\n';
const PUSH_INDENT: char = 0x02 as char;
const POP_INDENT: char = 0x03 as char;

Concrete types:: [source, rust]

#[derive(Debug, PartialEq, Copy, Clone)]
pub struct LineNum(u32);

#[derive(Debug, PartialEq, Clone, Eq)]
pub struct SLine(String);

#[derive(Debug, PartialEq, Clone, Eq, Default)]
pub struct SFlattedText(String);

Function to call:: [source, rust]

pub fn flatter(input: &str) -> Result<SFlattedText, Error>

Error type:: [source, rust]

#[derive(Debug, PartialEq)]
pub struct Error {
    pub line: LineNum,
    pub desc: String,
}

Thats all

Look into lib.rs

Examples

You can look into tests.rs, there are several tests.

.Simple example [source, rust]

# this input...

0
    01
    02
        020
        021
        023
            0230
            0231

# produces...
0
\u{2}01
02
\u{2}020
021
023
\u{2}0230
0231
\u{3}\u{3}\u{3}"

As you can see, indentations has been removed by codes to mark PUSH_INDENT and POP_INDENT

[NOTE] All lines are finished with new line. If last line has not a new line, the system will insert one

.Complex example [source, rust]

    let flat = flatter("
0
     || 01a
     01b
     01c

     02a
     02b

        |020a
        ||020b

        |  021a
        |021b
1a
1b
    11a
    ||11b
    11c

    12a  ||
    |12b  ||
2a
    21a
    21b
    |
    |

")
        .unwrap();

    assert!(flat ==
            SFlattedText::from("
0
\u{2}| 01a
01b
01c

02a
02b

\u{2}020a
|020b

  021a
021b
\u{3}\u{3}1a
1b
\u{2}11a
|11b
11c

12a  |
12b  |
\u{3}2a
\u{2}21a
21b



\u{3}"));

More examples on tests.rs

No runtime deps