#color #generate #notation #molecule #chemical #unique #substance

bin+lib moleco

Tool to generate color swatches for chemical compounds

3 unstable releases

0.1.0 Jul 28, 2024
0.0.2 Jul 5, 2024
0.0.1 Jun 23, 2024

#120 in Science

MIT/Apache and LGPL-3.0-or-later

1MB
2K SLoC

logo

Moleco

Moleco stands for molecule to color. It generates unique color swatch for given substance based on its InChI notation. It can also generate color identification for mixture using MInChI notation.

How to run

moleco generate "InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3" --print

That will generate a color swatch for caffeine.

caffeine

Installation

For now you can only install it with help of cargo, rust package manager.

cargo install moleco

Support for mixtures

Of course in nature there is much more likely to see mixtures instead of single substances, so MInChI is supported as well. You can generate toothpaste:

moleco generate "MInChI=0.00.1S/C12H26O4S.Na/c1-2-3-4-5-6-7-8-9-10-11-12-16-17(13,14)15;/h2-12H2,1H3,(H,13,14,15);/q;+1/p-1&C3H8O3/c4-1-3(6)2-5/h3-6H,1-2H2&C7H5NO3S.Na/c9-7-5-3-1-2-4-6(5)12(10,11)8-7;/h1-4H,(H,8,9);/q;+1/p-1&Ca.H3O4P.2H2O/c;1-5(2,3)4;;/h;(H3,1,2,3,4);2*1H2/q+2;;;/p-2&FH2O3P.2Na/c1-5(2,3)4;;/h(H2,2,3,4);;/q;2*+1/p-2&H2O/h1H2/n{6&2&&5&3&4&1}/g{215wf-3&25wf-2&1wf-2&8wf-3&2wf-3&5wf-1&15wf-3}" --print

toothpaste

NOTE: Printing may not be supported (well) in all terminals, so results may vary, but saved images will be correct.

or dishwashing liquid:

moleco generate "MInChI=0.00.1S/C12H26O4S.Na/c1-2-3-4-5-6-7-8-9-10-11-12-16-17(13,14)15;/h2-12H2,1H3,(H,13,14,15);/q;+1/p-1&C18H30O3S.Na/c1-2-3-4-5-6-7-8-9-10-11-12-17-13-15-18(16-14-17)22(19,20)21;/h13-16H,2-12H2,1H3,(H,19,20,21);/q;+1/p-1&ClH.Na/h1H;/q;+1/p-1&H2O/h1H2/n{4&{2&4}&&{1&4}&3}/g{807wf-3&{6pp1&4pp1}117wf-3&1wf-2&{27pp0&73pp0}66wf-3&}" --print

dishwashing liquid

or solution of 9-Borabicyclo[3.3.1]nonane in undefined amounts of hexanes:

moleco generate "MInChI=0.00.1S/C6H12/c1-6-4-2-3-5-6/h6H,2-5H2,1H3&C6H14/c1-3-5-6-4-2/h3-6H2,1-2H3&C6H14/c1-4-5-6(2)3/h6H,4-5H2,1-3H3&C6H14/c1-4-6(3)5-2/h6H,4-5H2,1-3H3&C8H15B/c1-3-7-5-2-6-8(4-1)9-7/h7-9H,1-6H2/n{5&{2&3&4&1}}/g{4mr-1&{&&&}}" --print

borabicyclononane in hexanes

or, if you are fan, you can generate bechamel sauce:

moleco generate "MInChI=0.00.1S/C12H17N4OS.ClH/c1-8-11(3-4-17)18-7-16(8)6-10-5-14-9(2)15-12(10)13;/h5,7,17H,3-4,6H2,1-2H3,(H2,13,14,15);1H/q+1;/p-1&C17H20N4O6/c1-7-3-9-10(4-8(7)2)21(5-11(23)14(25)12(24)6-22)15-13(18-9)16(26)20-17(27)19-15/h3-4,11-12,14,22-25H,5-6H2,1-2H3,(H,20,26,27)/t11-,12+,14-/m0/s1&C19H19N7O6/c20-19-25-15-14(17(30)26-19)23-11(8-22-15)7-21-10-3-1-9(2-4-10)16(29)24-12(18(31)32)5-6-13(27)28/h1-4,8,12,21H,5-7H2,(H,24,29)(H,27,28)(H,31,32)(H3,20,22,25,26,30)/t12-/m0/s1&C20H30O/c1-16(8-6-9-17(2)13-15-21)11-12-19-18(3)10-7-14-20(19,4)5/h6,8-9,11-13,21H,7,10,14-15H2,1-5H3/b9-6+,12-11+,16-8+,17-13+&C27H44O/c1-19(2)8-6-9-21(4)25-15-16-26-22(10-7-17-27(25,26)5)12-13-23-18-24(28)14-11-20(23)3/h12-13,19,21,24-26,28H,3,6-11,14-18H2,1-2,4-5H3/b22-12+,23-13-/t21-,24+,25-,26+,27-/m1/s1&C27H46O/c1-18(2)7-6-8-19(3)23-11-12-24-22-10-9-20-17-21(28)13-15-26(20,4)25(22)14-16-27(23,24)5/h9,18-19,21-25,28H,6-8,10-17H2,1-5H3/t19-,21+,22+,23-,24+,25+,26+,27-/m1/s1&C6H5NO2/c8-6(9)5-2-1-3-7-4-5/h1-4H,(H,8,9)&C8H10NO6P/c1-5-8(11)7(3-10)6(2-9-5)4-15-16(12,13)14/h2-3,11H,4H2,1H3,(H2,12,13,14)&C9H17NO5/c1-9(2,5-11)7(14)8(15)10-4-3-6(12)13/h7,11,14H,3-5H2,1-2H3,(H,10,15)(H,12,13)/t7-/m0/s1&Ca/q+2&Na/q+1/n{{{{&}&6&11&&4}&{{&}&&&1&2&7&9&8&3&}}&{{&}&6&11&&&4&5&10}}/g{{{{56wf-2&25wf-3}8wf-1&3wf-3&1wf-2&125wf-4&}466wf-3&{{56wf-4&168wf-3}725wf-3&187wf-4&137wf-3&447wf-8&215wf-8&6365wf-8&1008wf-8&341wf-8&49wf-8&9wf-3}534wf-3}1pp1&{{6wv-1&2wv-2}2wv-2&8wv-5&48wv-5&48wv-3&36wv-3&&&}9pp1}" --print

bechamel sauce

Motivation

Idea was to create color code for containers with specific substances, that are easy to distinct:

Cylinders with technical gases

...and if you change form factor - it is still easy, if you know color codes:

Cans with technical gases

(As you can notice - oxygen and argon have similar swatches - primary and complementary, so you must be careful with those two; such collisions are inevitable, so be creative with design, create patterns and use accents, so you won't introduce confusion).

How to generate InChI or MInChI?

For simple substances you can use PubChem, try also searching "substance name IhChI" - you should find it. For mixtures you can use MInChI demo.

How mixture bar sizes are calculated

First of all - values at mixture bar (at the bottom for mixtures) are on logharitmic scale. This may be problematic, since if you consider two solutions of ethanol, one 40% and second 70% - its hard to see what is what:

ethanol 40%

ethanol 70%

Not really a difference.

But that was not the goal - the goal was to quickly differ between solutions with small amounts of potentially harmful chemicals. Consider again solution of ethanol - one 40% in water, second 40% of ethanol and 0.1% of bitrex (denatonium benzoate) in water.

ethanol 40%

ethanol 40% with bitrex

Now its easy to make a difference even if there are trace amounts of extra substances.

Order of color swatches

Order is not guaranteed. Moleco will try to keep original order of substances in mixture - the one given in command (MInChi demo (see links below) have specific order for substances). It may happen though that one of substances in middle of notation has missing or unestimated concentration - in such case its swatch will be moved to the end of the bar, so primary colors of substances will be visibly matching to bar colors.

Good example of such behavior is image of dishwashing liquid - if you decipher notation you will see that third substance (sodium chloride) has missing concentration, so it is moved to the end of the bar, behind water swatch. (You can find full notation in examples above).

dishwashing liquid

Unknown and unestimated capacity

Sometimes you will not pass all the concentration in mixture, like in this 37% solution of formaldehyde in water:

moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&H2O/h1H2/n{1&2}/g{37wf-2&}" --print

37% formaldehyde in water

is easy to calculate remaining amount of water (not precisely, not in molar sense, but since sizes are logarithmic we can skip small uncertainties) - it is ~63%. But what if there are two solvants like water and methanol without giving their concentrations - then it is possible to estimate remaining amount, but not exact amount of each solvent. In such case the remaining compound is marked as unknown.

moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&CH4O/c1-2/h2H,1H3 &H2O/h1H2/n{1&3&2}/g{37wf-2&&}" --print

37% formaldehyde in water and methanol

Furthermore, if you use ratio (VP) in notation and you wont pass concentration of at least one ingredient, then the remaining amount is marked as unestimated.

moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&H2O/h1H2/n{1&2}/g{37vp0&}" --print

37% formaldehyde in water unestimated

Similar is with molar per liter/kilogram notions - MB and MR - if you use them at all the bar will show extra unestimated and unknown compound. It is becauce moleco is not calculating molar mass and volumes (it doesn't contain any internal database for substances), so it assumes that there is something extra as the result. You may wonder why its not treated like in case of range notation (see next paragraph) and not left in hands of user - MB and MR are currently always wrong - thats why. If you want to have quick walkaround - simply replace it with VP notation, or, take the longer route, and actually convert those notations to other, that is fully supported.

moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&H2O/h1H2/n{1&2}/g{37mb0&63mb0}" --print

37% formaldehyde in water molar

Extra concentration notes

In case of range notation, like "10:20" only higher amount will be taken into account. This is due to fact that moleco is trying to estimate unknown/unestimated substances and if max possible solution is exceeding potential capacity - it is assumed user knows what he is doing. If you want to show extra substance, because you know there is some, you can always add it as separate, unmarked substance. See exambles below - second one is showing extra substance because one extra group is added to indexation and concentration notation.

moleco generate "MInChI=0.00.1S/C2H6O/c1-2-3/h3H,2H2,1H3&H2O/h1H2/n{1&2}/g{4vp1&6vp1}" --print-only

vs

moleco generate "MInChI=0.00.1S/C2H6O/c1-2-3/h3H,2H2,1H3&H2O/h1H2/n{1&2&}/g{4vp1&6vp1&}" --print-only

results look like

37% formaldehyde in water 37% formaldehyde in water open bar

Questions

Why no support for molar mass and volume?

That would require incorporating some database of substances and their properties. This is way above the initial scope of this project, but could be considered in future.

Are there collisions?

Yes, a lot. Out of over 117 millions of unique InChI strings that you can fetch from https://pubchem.ncbi.nlm.nih.gov/ - only more that 80 millions are unique. And those are exact collisions, not including fact, that if given hue in swatch is different only by 1 degree - it is too little to be detected by human eye, even if technically there is no collision. Be warned and if you want to differentiate between two substances with similar swatches - be creative with design.

Why no support for InChIKey?

Initially idea was to create system that is unique for every substance - and InChIKey already had some confirmed collisions, so it was not considered. Reality was more brutal (see above) but it was too late to include InChIKey.

Why the shape?

Diamond divided into four parts was initial idea, usually when creating color swatch you will get 4 or 5 colors, but to have nice complement hue - 4 is easy to generate and diamond shape looks nice. To avoid confusion with NFPA 704 marking - cutouts were introduced - therefore this "flower" shape.

Orientation mark is introduced as well to not be confused in case if single compound mark.

How to recognize the substance?

It may be challenging to recognize the substance based on the color swatch after some time, so be sure to keep the name of substance or InChI notation somewhere close if you are using just the swatch. If you have original image file though - original substance will be saved in EXIF metadata.

References

InChI and MInChI

Color spaces

PubChem resources

Dependencies

~18–30MB
~437K SLoC