3 unstable releases
0.1.0 | Jul 28, 2024 |
---|---|
0.0.2 | Jul 5, 2024 |
0.0.1 | Jun 23, 2024 |
#120 in Science
1MB
2K
SLoC
Moleco
Moleco stands for molecule to color. It generates unique color swatch for given substance based on its InChI notation. It can also generate color identification for mixture using MInChI notation.
How to run
moleco generate "InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3" --print
That will generate a color swatch for caffeine.
Installation
For now you can only install it with help of cargo, rust package manager.
cargo install moleco
Support for mixtures
Of course in nature there is much more likely to see mixtures instead of single substances, so MInChI is supported as well. You can generate toothpaste:
moleco generate "MInChI=0.00.1S/C12H26O4S.Na/c1-2-3-4-5-6-7-8-9-10-11-12-16-17(13,14)15;/h2-12H2,1H3,(H,13,14,15);/q;+1/p-1&C3H8O3/c4-1-3(6)2-5/h3-6H,1-2H2&C7H5NO3S.Na/c9-7-5-3-1-2-4-6(5)12(10,11)8-7;/h1-4H,(H,8,9);/q;+1/p-1&Ca.H3O4P.2H2O/c;1-5(2,3)4;;/h;(H3,1,2,3,4);2*1H2/q+2;;;/p-2&FH2O3P.2Na/c1-5(2,3)4;;/h(H2,2,3,4);;/q;2*+1/p-2&H2O/h1H2/n{6&2&&5&3&4&1}/g{215wf-3&25wf-2&1wf-2&8wf-3&2wf-3&5wf-1&15wf-3}" --print
NOTE: Printing may not be supported (well) in all terminals, so results may vary, but saved images will be correct.
or dishwashing liquid:
moleco generate "MInChI=0.00.1S/C12H26O4S.Na/c1-2-3-4-5-6-7-8-9-10-11-12-16-17(13,14)15;/h2-12H2,1H3,(H,13,14,15);/q;+1/p-1&C18H30O3S.Na/c1-2-3-4-5-6-7-8-9-10-11-12-17-13-15-18(16-14-17)22(19,20)21;/h13-16H,2-12H2,1H3,(H,19,20,21);/q;+1/p-1&ClH.Na/h1H;/q;+1/p-1&H2O/h1H2/n{4&{2&4}&&{1&4}&3}/g{807wf-3&{6pp1&4pp1}117wf-3&1wf-2&{27pp0&73pp0}66wf-3&}" --print
or solution of 9-Borabicyclo[3.3.1]nonane in undefined amounts of hexanes:
moleco generate "MInChI=0.00.1S/C6H12/c1-6-4-2-3-5-6/h6H,2-5H2,1H3&C6H14/c1-3-5-6-4-2/h3-6H2,1-2H3&C6H14/c1-4-5-6(2)3/h6H,4-5H2,1-3H3&C6H14/c1-4-6(3)5-2/h6H,4-5H2,1-3H3&C8H15B/c1-3-7-5-2-6-8(4-1)9-7/h7-9H,1-6H2/n{5&{2&3&4&1}}/g{4mr-1&{&&&}}" --print
or, if you are fan, you can generate bechamel sauce:
moleco generate "MInChI=0.00.1S/C12H17N4OS.ClH/c1-8-11(3-4-17)18-7-16(8)6-10-5-14-9(2)15-12(10)13;/h5,7,17H,3-4,6H2,1-2H3,(H2,13,14,15);1H/q+1;/p-1&C17H20N4O6/c1-7-3-9-10(4-8(7)2)21(5-11(23)14(25)12(24)6-22)15-13(18-9)16(26)20-17(27)19-15/h3-4,11-12,14,22-25H,5-6H2,1-2H3,(H,20,26,27)/t11-,12+,14-/m0/s1&C19H19N7O6/c20-19-25-15-14(17(30)26-19)23-11(8-22-15)7-21-10-3-1-9(2-4-10)16(29)24-12(18(31)32)5-6-13(27)28/h1-4,8,12,21H,5-7H2,(H,24,29)(H,27,28)(H,31,32)(H3,20,22,25,26,30)/t12-/m0/s1&C20H30O/c1-16(8-6-9-17(2)13-15-21)11-12-19-18(3)10-7-14-20(19,4)5/h6,8-9,11-13,21H,7,10,14-15H2,1-5H3/b9-6+,12-11+,16-8+,17-13+&C27H44O/c1-19(2)8-6-9-21(4)25-15-16-26-22(10-7-17-27(25,26)5)12-13-23-18-24(28)14-11-20(23)3/h12-13,19,21,24-26,28H,3,6-11,14-18H2,1-2,4-5H3/b22-12+,23-13-/t21-,24+,25-,26+,27-/m1/s1&C27H46O/c1-18(2)7-6-8-19(3)23-11-12-24-22-10-9-20-17-21(28)13-15-26(20,4)25(22)14-16-27(23,24)5/h9,18-19,21-25,28H,6-8,10-17H2,1-5H3/t19-,21+,22+,23-,24+,25+,26+,27-/m1/s1&C6H5NO2/c8-6(9)5-2-1-3-7-4-5/h1-4H,(H,8,9)&C8H10NO6P/c1-5-8(11)7(3-10)6(2-9-5)4-15-16(12,13)14/h2-3,11H,4H2,1H3,(H2,12,13,14)&C9H17NO5/c1-9(2,5-11)7(14)8(15)10-4-3-6(12)13/h7,11,14H,3-5H2,1-2H3,(H,10,15)(H,12,13)/t7-/m0/s1&Ca/q+2&Na/q+1/n{{{{&}&6&11&&4}&{{&}&&&1&2&7&9&8&3&}}&{{&}&6&11&&&4&5&10}}/g{{{{56wf-2&25wf-3}8wf-1&3wf-3&1wf-2&125wf-4&}466wf-3&{{56wf-4&168wf-3}725wf-3&187wf-4&137wf-3&447wf-8&215wf-8&6365wf-8&1008wf-8&341wf-8&49wf-8&9wf-3}534wf-3}1pp1&{{6wv-1&2wv-2}2wv-2&8wv-5&48wv-5&48wv-3&36wv-3&&&}9pp1}" --print
Motivation
Idea was to create color code for containers with specific substances, that are easy to distinct:
...and if you change form factor - it is still easy, if you know color codes:
(As you can notice - oxygen and argon have similar swatches - primary and complementary, so you must be careful with those two; such collisions are inevitable, so be creative with design, create patterns and use accents, so you won't introduce confusion).
How to generate InChI or MInChI?
For simple substances you can use PubChem, try also searching "substance name IhChI" - you should find it. For mixtures you can use MInChI demo.
How mixture bar sizes are calculated
First of all - values at mixture bar (at the bottom for mixtures) are on logharitmic scale. This may be problematic, since if you consider two solutions of ethanol, one 40% and second 70% - its hard to see what is what:
Not really a difference.
But that was not the goal - the goal was to quickly differ between solutions with small amounts of potentially harmful chemicals. Consider again solution of ethanol - one 40% in water, second 40% of ethanol and 0.1% of bitrex (denatonium benzoate) in water.
Now its easy to make a difference even if there are trace amounts of extra substances.
Order of color swatches
Order is not guaranteed. Moleco will try to keep original order of substances in mixture - the one given in command (MInChi demo (see links below) have specific order for substances). It may happen though that one of substances in middle of notation has missing or unestimated concentration - in such case its swatch will be moved to the end of the bar, so primary colors of substances will be visibly matching to bar colors.
Good example of such behavior is image of dishwashing liquid - if you decipher notation you will see that third substance (sodium chloride) has missing concentration, so it is moved to the end of the bar, behind water swatch. (You can find full notation in examples above).
Unknown and unestimated capacity
Sometimes you will not pass all the concentration in mixture, like in this 37% solution of formaldehyde in water:
moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&H2O/h1H2/n{1&2}/g{37wf-2&}" --print
is easy to calculate remaining amount of water (not precisely, not in molar sense, but since sizes are logarithmic we can skip small uncertainties) - it is ~63%. But what if there are two solvants like water and methanol without giving their concentrations - then it is possible to estimate remaining amount, but not exact amount of each solvent. In such case the remaining compound is marked as unknown.
moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&CH4O/c1-2/h2H,1H3 &H2O/h1H2/n{1&3&2}/g{37wf-2&&}" --print
Furthermore, if you use ratio (VP
) in notation and you wont pass concentration of at least one ingredient, then the remaining amount is marked as unestimated.
moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&H2O/h1H2/n{1&2}/g{37vp0&}" --print
Similar is with molar per liter/kilogram notions - MB
and MR
- if you use them at all the bar will show extra unestimated and unknown compound. It is becauce moleco is not calculating molar mass and volumes (it doesn't contain any internal database for substances), so it assumes that there is something extra as the result. You may wonder why its not treated like in case of range notation (see next paragraph) and not left in hands of user - MB
and MR
are currently always wrong - thats why. If you want to have quick walkaround - simply replace it with VP
notation, or, take the longer route, and actually convert those notations to other, that is fully supported.
moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&H2O/h1H2/n{1&2}/g{37mb0&63mb0}" --print
Extra concentration notes
In case of range notation, like "10:20" only higher amount will be taken into account. This is due to fact that moleco is trying to estimate unknown/unestimated substances and if max possible solution is exceeding potential capacity - it is assumed user knows what he is doing. If you want to show extra substance, because you know there is some, you can always add it as separate, unmarked substance. See exambles below - second one is showing extra substance because one extra group is added to indexation and concentration notation.
moleco generate "MInChI=0.00.1S/C2H6O/c1-2-3/h3H,2H2,1H3&H2O/h1H2/n{1&2}/g{4vp1&6vp1}" --print-only
vs
moleco generate "MInChI=0.00.1S/C2H6O/c1-2-3/h3H,2H2,1H3&H2O/h1H2/n{1&2&}/g{4vp1&6vp1&}" --print-only
results look like
Questions
Why no support for molar mass and volume?
That would require incorporating some database of substances and their properties. This is way above the initial scope of this project, but could be considered in future.
Are there collisions?
Yes, a lot. Out of over 117 millions of unique InChI strings that you can fetch from https://pubchem.ncbi.nlm.nih.gov/ - only more that 80 millions are unique. And those are exact collisions, not including fact, that if given hue in swatch is different only by 1 degree - it is too little to be detected by human eye, even if technically there is no collision. Be warned and if you want to differentiate between two substances with similar swatches - be creative with design.
Why no support for InChIKey?
Initially idea was to create system that is unique for every substance - and InChIKey already had some confirmed collisions, so it was not considered. Reality was more brutal (see above) but it was too late to include InChIKey.
Why the shape?
Diamond divided into four parts was initial idea, usually when creating color swatch you will get 4 or 5 colors, but to have nice complement hue - 4 is easy to generate and diamond shape looks nice. To avoid confusion with NFPA 704 marking - cutouts were introduced - therefore this "flower" shape.
Orientation mark is introduced as well to not be confused in case if single compound mark.
How to recognize the substance?
It may be challenging to recognize the substance based on the color swatch after some time, so be sure to keep the name of substance or InChI notation somewhere close if you are using just the swatch. If you have original image file though - original substance will be saved in EXIF metadata.
References
InChI and MInChI
- https://jcheminf.biomedcentral.com/articles/10.1186/s13321-015-0068-4
- https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0357-4
- http://molmatinf.com/minchidemo/
- https://github.com/IUPAC/MInChI_demo
Color spaces
PubChem resources
Dependencies
~18–30MB
~437K SLoC