35 releases

0.7.0	Jan 21, 2024
0.5.5	Aug 3, 2023
0.5.4	Jul 23, 2023
0.4.1	Mar 6, 2023
0.0.36	~~Mar 28, 2021~~

#765 in Network programming

Used in 18 crates

MIT license

1MB
12K SLoC

uSIEM

Framework definitions that allow building a custom SIEM. Code you own SIEM like you code your own web page.

Motivation

I have seen many times updates in rules that break things or inneficient regular expressions that slow down the entire SIEM. With the aproach of uSIEM, all changes to the SIEM are traced and can be reversed using a version control system like GIT. Also you can have a team designing rules with SIGMA in a different repository and join the SIEM code and the rules using CI/CD tools like Jenkins that can run a integration test to check if the new changes breaks the system and then deploy the new version.

Some benchmarks (single-thread):

Log Source	Events/second
Suricata(JSON)	261780
OpnSense Firewall	750127

You can see a more updated document here: https://github.com/u-siem/parser-benchmarks

Architecture

                                                        |---------------|                
                                                 |----> | GatheringNode |------------------>|
                                                 |      |---------------|                   |
                                                 |                                          |
|---------|       |------------|       |------------------|       |--------------|          |
|InputNode| ----> | ParsingNode| ----> |  Enrichment Node | ----> | IndexingNode |          |
|---------|       |------------|       |------------------| |     |--------------|          |
                                                            |                               |
                                                            |   |----------|       |--------------|       |----------|           
                                                            --> | RuleNode | ----> | AlertingNode | ----> | SoarNode |
                                                                |----------|       |--------------|       |----------|

Note: It has changed quite a lot and is still changing.

Node Types

Kernel

The kernel will be executed in a single thread with high priority and will be responsible of scaling the number of nodes of each type if it detects a congestion in a element. It will alse route messages between nodes.

Commander

This component will accept commands from a user and sent it to be routed by the kernel to specific nodes.

InputNode

It ingests logs and process them. We can support elasticsearch type (Like API-REST) or syslog.

ParsingNode

This node will be the most important and will be able to parse the logs sent to him by the input node. There will be a special node still being designed to process logs like MySQL or Linux logs that are multiline.

EnrichmentNode

It adds information about the IP, if its in a blocklist, if its a AmazonWebServices/Azure/GoogleCloud IP, if the IP has never been seen it then it contacts the GatheringNode to find information about that IP. It adds the tag "never_seen_ip" in that cases. It uses datasets to access information in a non-blocking form. See https://1drv.ms/p/s!AvHbai5ZA14wjV9J4rbBlSWyIw0t?e=AgBWNf

GatheringNode

Consults feeds or databases like AbuseIP/Virus total to know if the IP is malicios or not, the same with domains and Hashes. It then updates the appropiated Dataset with the info to enrich future logs.

IndexingNode

Send logs to index in the database (elasticsearch/SQLite/Postgres...) and queries them when asked.

RuleNode

Set conditions for logs and fires alerts when the conditions matches. If a Rule is battletested, then we can tell the SOAR node to do things. https://github.com/Neo23x0/sigma/tree/master/rules/windows

AlertNode

Creates alerts and sents them to another SIEM or stores them locally to use the native UI. Uses templates for the alerts.

SoarNode

Do actions automatically, like blocking IPs, domains... OpnSense supports blocking IPs with a simple API-REST call, the same for Cortex XDR. For PaloAlto: https://panos.pan.dev/docs/apis/panos_dag_qs Work in progress: define a custom trait that can be used with a common component as to simplify design. So we only need to import a library that defines the actions to be done (like an API call) and works in any custom SOAR component.

An idea: Apply multiple simple rules (like DarkTrace) does to calculate the threat score associated with the event. That score is added to the total score of a user in a period of time (Slicing Window). It will be implemented in redis wit a ScoreSet of users-scores in periods of 15 min with each removed after 24 hours by default.

Datasets

The Datasets are similar to the QRadar reference sets. They store information such as IP, IOC ... and can be populated almost in real time from the information extracted from the logs.

Internal Events

Each component works like a single entity. It can receive specific commands like STOP_COMPONENT, ISOLATE_IP, LOG_QUERY or specific commands only intended to be used in that component; sent responses to the commands received; process a log; sent notifications (the logging system of uSIEM); receive Dataset updates or fire Alerts. See: https://github.com/u-siem/u-siem-core/blob/main/src/components/common.rs

Normalziation

To simplify the design and enforce a common way of doing things, it has been designed a object that contains the basic information of a log and that maps the events to normalized fields: https://github.com/u-siem/u-siem-core/blob/main/src/events/mod.rs#L26

    pub fn set_event(&mut self, event: SiemEvent) {
        match &event {
            SiemEvent::Firewall(fw) => {
                self.add_field(field_dictionary::SOURCE_IP, SiemField::IP(fw.source_ip().clone()));
                self.add_field(field_dictionary::DESTINATION_IP, SiemField::IP(fw.destination_ip().clone()));
                self.add_field(field_dictionary::SOURCE_PORT, SiemField::U32(fw.source_port as u32));
                self.add_field(field_dictionary::DESTINATION_PORT, SiemField::U32(fw.destination_port as u32));
                self.add_field(field_dictionary::EVENT_OUTCOME, SiemField::Text(Cow::Owned(fw.outcome().to_string())));
                self.add_field(field_dictionary::IN_INTERFACE, SiemField::Text(Cow::Owned(fw.in_interface().to_string())));
                self.add_field(field_dictionary::OUT_INTERFACE, SiemField::Text(Cow::Owned(fw.out_interface().to_string())));
                self.add_field(field_dictionary::SOURCE_BYTES, SiemField::U32(fw.out_bytes));
                self.add_field(field_dictionary::DESTINATION_BYTES, SiemField::U32(fw.in_bytes));
                self.add_field(field_dictionary::NETWORK_TRANSPORT, SiemField::Text(Cow::Owned(fw.network_protocol().to_string())));
            },

Another normalization aspect that is really interesting is in the categorization of WebProxys: https://github.com/u-siem/u-siem-core/blob/main/src/events/webproxy.rs#L85

Thus all the categories of the different manufacturers will be reduced to the following:

pub enum WebProxyRuleCategory {
    Abortion,
    MatureContent,
    Alcohol,
    AlternativeSpirituality,
    ArtCulture,
    Auctions,
    AudioVideoClips,
    Trading,
    Economy,
    Charitable,
    ...

TODO List

Log structure
Design of components with local horizontal scalability: The kernel must be able to increase the number of threads assigned to a components if there is a congestion in the event queue.
Prometheus metrics: processed events of each type, errors, users connected, number of querys...
Event types with fields.
Custom error system
Components and kernel interfaces. The components must be registered in the kernel, but the kernel in another instance must be able to know that they exist.
SIEM inter-Kernel channels, to allow horizontal scaling (Redis channels). Depends on the kernel implementation.
Datasets that allow Real-Time log enhancement.
User role design. Inspiration from PaloAlto Cortex XDR
Behavour Engine component: Inspiration from Darktrace that uses a lot of small rules to generate a threat score for the event and increse the total score for the user.
Active Directory integration for enrich event logs related to users.
Desing a lightweight components that works as agents to get logs from Cloud related sources Agent Ingestion Components
SIGMA rules support.
Enforced storage schema for logs. Each source extracts multiple fields with different names. In elastic its not recomended to have more than 1000 fields. Also, it must allow renaming of fields because ECS uses dots in the field names but the majority of databases cant.
GDPR included for logs: An analyst does not need to know information about users, such as the websites they visit or the emails they receive. Integrated into the Storage Schema, like the url related fields must be stored encrypted for WebProxy events or the email.subject, email.files or email.source.user.name for Mail events.
Mantaince calendar. Used to disable alerting of events related to device configurations, like FortiSIEM does.
Internacionalization of texts
User login and third party integration

Dependencies

~4–6.5MB
~115K SLoC