|0.1.3||Feb 23, 2023|
|0.1.2||Feb 23, 2023|
|0.1.1||Feb 23, 2023|
|0.1.0||Feb 10, 2023|
#267 in Database interfaces
A custom CSV -> DB uploader program.
Trust me, you'll need speed when uploading 5M records.
Parallelized in a two step process (looped):
- We buffer records in an array as we read and parse (ex. 1000 records). This is the reader (main thread)
- Once that array fills up, we push the asynchronous upload future/task to a stack to be executed. (ex. 4 uploader threads)
Warning!: the paralellization between threads (step 2) is still being worked on. I'm still reading up on the
tokio library lol. :)
As a secondary goal. We normalize the data while we parse it.
This is highly variable and dependant on two things:
- The DB and the Data Types it uses.
- The datasets we're uploading and the type of data we've seen so far.
So our current process is:
- Parse to JSON data types
- Drop any empty String values
- Parse "False" -> false, "True" -> true
- Replace ' inside Strings to " and try parsing again (because there's been some datasets in which that's been the case)
Supported DB's (for now)