Discussion:
[Registry] array handling
Kai-Uwe Behrmann
2015-02-11 10:36:23 UTC
Permalink
Hello,

how can the size of an array be found for a key? To explain a bit, I
have a JSON document with arrays and need to know how many array
elements are at top key. I want then to give the array keys their proper
path name including the #number array address.

kind regards
Kai-Uwe
Markus Raab
2015-02-11 13:32:02 UTC
Permalink
Hello,
Post by Kai-Uwe Behrmann
how can the size of an array be found for a key? To explain a bit, I
have a JSON document with arrays and need to know how many array
elements are at top key. I want then to give the array keys their proper
path name including the #number array address.
Currently, you have to exhaustively iterate over all array entries to know the
number of entries. Because this might be unpractical for applications, we
consider writing the number as value into the array key, e.g.:

user/array = 3
user/array/#0 = some value
user/array/#1 = other value
user/array/#2 = last entry

What do you think?

best regards,
--
Markus Raab http://www.libelektra.org
Technische Universität Wien ***@markus-raab.org
Institut für Computersprachen Phone: (+431) 58801/185185
Argentinierstr. 8, 1040 Wien, Austria FAX: (+431) 58801/18598

DVR 0005886
Kai-Uwe Behrmann
2015-02-11 17:21:07 UTC
Permalink
Post by Markus Raab
Post by Kai-Uwe Behrmann
how can the size of an array be found for a key? To explain a bit, I
have a JSON document with arrays and need to know how many array
elements are at top key. I want then to give the array keys their proper
path name including the #number array address.
Currently, you have to exhaustively iterate over all array entries to know the
number of entries. Because this might be unpractical for applications, we
Key * key = keyNew( "user/array/#0", KEY_END );
rc = kdbGet( handle, ks, key );
Key * search = ksLookupByName( ks, "user/array/#0", 0 );
if(search)
/* do something with #0 */

So ksLookupByName() should be sufficient to search for the array length
by searching for
user/array/#0
user/array/#1
user/array/#2
user/array/#3 // will fail, so we have 3 entries in user/array
Post by Markus Raab
user/array = 3
user/array/#0 = some value
user/array/#1 = other value
user/array/#2 = last entry
What do you think?
Given that '#' is used for the array index, following might be better (?):
/user/array = #3

That would make the method with ksLookupByName() obsolete for array size
detection.

kind regards
Kai-Uwe
Kai-Uwe Behrmann
2015-02-12 20:19:10 UTC
Permalink
Post by Kai-Uwe Behrmann
Post by Markus Raab
user/array = 3
user/array/#0 = some value
user/array/#1 = other value
user/array/#2 = last entry
What do you think?
/user/array = #3
How can a user detect that creating a new array, e.g. user/array/#0,
will work as expected?

br
Markus Raab
2015-02-12 21:36:31 UTC
Permalink
Hello,
Post by Kai-Uwe Behrmann
Post by Kai-Uwe Behrmann
Given that '#' is used for the array index, following might be better
(?): /user/array = #3
How can a user detect that creating a new array, e.g. user/array/#0,
will work as expected?
I am not sure what you mean, why should users creating the array not know what
they created? Arrays are a convention and everyone creating an array should
follow them.

You are right, however, that an API is missing that makes creating arrays
easy. I propose a

ksAppendArray(KeySet*, Key *);

that renames key properly to be a new member of an array, add the key and
update the value, e.g. we have:

user/array = #0
user/array/#0

and after executing:

ksAppendArray(ks, keyNew("user/array", KEY_VALUE, "x", KEY_END));

we get:

user/array = #1
user/array/#0
user/array/#1 = x

best regards,
--
Markus Raab http://www.libelektra.org
Technische Universität Wien ***@markus-raab.org
Institut für Computersprachen Phone: (+431) 58801/185185
Argentinierstr. 8, 1040 Wien, Austria FAX: (+431) 58801/18598

DVR 0005886
Kai-Uwe Behrmann
2015-02-13 11:38:45 UTC
Permalink
Post by Markus Raab
Post by Kai-Uwe Behrmann
Post by Kai-Uwe Behrmann
Given that '#' is used for the array index, following might be better
(?): /user/array = #3
How can a user detect that creating a new array, e.g. user/array/#0,
will work as expected?
I am not sure what you mean, why should users creating the array not know what
they created? Arrays are a convention and everyone creating an array should
follow them.
User might not know if a key name space supports arrays. And there I
want to help them to get the expected results. That means a persistent
storage of their keys. The API should be shot and forget style.

Here some practical a examples:
Bob wants to add a key set, k1,k2 to user/array/. The array contains
some fields already. Which address shall he use to place the new keys
into an new array field?
(user/array/[#0,#1] exist already)
user/array/#2/k1
user/array/#2/k2

Linda wants to add a key set k3,k4 to user/maybe_array. She does not
know up front, how many fields are inside user/maybe_array. If the yajl
backend is not mounted to user/maybe_array, she wants still some useful
fallback, so that here keys get not lost. Which address can she use if
user/maybe_array has no keys yet below?
(user/maybe_array/ has no keys yet and yajl is not mounted to that name
space)
Shall she write:
user/maybe_array/first/k3
user/maybe_array/first/k4
or ...
user/maybe_array/#0/k3
user/maybe_array/#0/k4
How to figure out the proper name?

I hope to could have made my point clearer.

Btw. ksAppendArray(KeySet*, Key *); would help only in a case, where a
array contains single keys. That is not of much help for both of Linda
and Bob above.

br
Markus Raab
2015-02-13 18:57:30 UTC
Permalink
Hello,
Post by Kai-Uwe Behrmann
User might not know if a key name space supports arrays. And there I
want to help them to get the expected results. That means a persistent
storage of their keys. The API should be shot and forget style.
The API is shot and forget style. It does not matter if the underlying storage
"supports" (i.e. format nicely) arrays or not.

In yajl you get:
"array" : ["a", "b", "c"]

in XML, without arrays, you might get clumsy stuff like:
<array><#0>a</#0><#1>b</#1><#2>c</#2></array>
or anything else like:
<array><entry>a</entry><entry>b</entry><entry>c</entry></array>

Nevertheless, the user experience using kdbGet()/kdbSet() will be the same in
every ways (during discussion we also had INI examples, same thing there).
Post by Kai-Uwe Behrmann
Bob wants to add a key set, k1,k2 to user/array/. The array contains
some fields already. Which address shall he use to place the new keys
into an new array field?
(user/array/[#0,#1] exist already)
user/array/#2/k1
user/array/#2/k2
The proposed API ksAppendArray would take care that keys get the next
available index entries.
Post by Kai-Uwe Behrmann
Linda wants to add a key set k3,k4 to user/maybe_array. She does not
know up front, how many fields are inside user/maybe_array.
She easily can find out: just kdbGet() and exhaustively iterate until end is
found (or use value once proposal is implemented).
Post by Kai-Uwe Behrmann
If the yajl
backend is not mounted to user/maybe_array, she wants still some useful
fallback, so that here keys get not lost. Which address can she use if
user/maybe_array has no keys yet below?
(user/maybe_array/ has no keys yet and yajl is not mounted to that name
space)
user/maybe_array/first/k3
user/maybe_array/first/k4
or ...
user/maybe_array/#0/k3
user/maybe_array/#0/k4
How to figure out the proper name?
I hope to could have made my point clearer.
I hope my initial explanation made clearer what an array is. There is no such
thing as "maybe_array". You can always take the next index. If you (as
application developer) define something is an array, it is.

You are, however, correct that arrays cause problems in distribution context.
Suppose software A, B add something during installation to an array:
user/array/#0 ; was added earlier by admin
user/array/#1 ; added by A
user/array/#2 ; added by A, but changed by admin
user/array/#3 ; added by B

When software A is purged (i.e. its configuration should be removed) there is
no obvious way to detect which entries were added during installation
(especially if the administrator alters parts of it). We could do reference
counting in meta data, but I am not really happy with that.

Another problem is if something is expected to be an array but in fact is not.
Currently, yajl has undefined behaviour if something starts with #0 but is not
an array and applications have problems, too. But thats basically only another
reason why configuration files must be validated. Users always can write garbage
in configuration files, thats a fundamental problem unrelated to arrays.
Post by Kai-Uwe Behrmann
Btw. ksAppendArray(KeySet*, Key *); would help only in a case, where a
array contains single keys. That is not of much help for both of Linda
and Bob above.
It should be generic and always help. But maybe I still did not understand ;)
Can come up with a monitor/keyboard example which makes more clear if k1, k2
is intended to be a subkey below an array index or as key to be added as array
index.

best regards,
--
Markus Raab http://www.libelektra.org
Technische UniversitÀt Wien ***@markus-raab.org
Institut fÃŒr Computersprachen Phone: (+431) 58801/185185
Argentinierstr. 8, 1040 Wien, Austria FAX: (+431) 58801/18598

DVR 0005886
Kai-Uwe Behrmann
2015-02-13 21:57:02 UTC
Permalink
Post by Markus Raab
Post by Kai-Uwe Behrmann
User might not know if a key name space supports arrays. And there I
want to help them to get the expected results. That means a persistent
storage of their keys. The API should be shot and forget style.
The API is shot and forget style. It does not matter if the underlying storage
"supports" (i.e. format nicely) arrays or not.
"array" : ["a", "b", "c"]
<array><#0>a</#0><#1>b</#1><#2>c</#2></array>
<array><entry>a</entry><entry>b</entry><entry>c</entry></array>
Nevertheless, the user experience using kdbGet()/kdbSet() will be the same in
every ways (during discussion we also had INI examples, same thing there).
Agreed, I added a test for that and it works fine. So we can use array
syntax as a no brainer.

Thanks for clarification!
Post by Markus Raab
Post by Kai-Uwe Behrmann
Linda wants to add a key set k3,k4 to user/maybe_array. She does not
know up front, how many fields are inside user/maybe_array.
She easily can find out: just kdbGet() and exhaustively iterate until end is
found (or use value once proposal is implemented).
That is implemented.

while(!found) {
if(new_key_name) oyFree_m_( new_key_name );
oyStringAddPrintf( &new_key_name, AD, "%s/#%d", key_base_name, i );

ksRewind( ks );
key = keyNew( new_key_name, KEY_END );
cut = ksCut( ks, key );
count = ksGetSize( cut );
if(!cut || !count)
found = 1;
keyDel( key );
++i;
}
Post by Markus Raab
I hope my initial explanation made clearer what an array is. There is no such
thing as "maybe_array". You can always take the next index. If you (as
application developer) define something is an array, it is.
great
Post by Markus Raab
You are, however, correct that arrays cause problems in distribution context.
user/array/#0 ; was added earlier by admin
user/array/#1 ; added by A
user/array/#2 ; added by A, but changed by admin
user/array/#3 ; added by B
When software A is purged (i.e. its configuration should be removed) there is
no obvious way to detect which entries were added during installation
(especially if the administrator alters parts of it). We could do reference
counting in meta data, but I am not really happy with that.
A read time stamp, alias atime, would help. But that might be expensive
depending how it is implemented.
Post by Markus Raab
Another problem is if something is expected to be an array but in fact is not.
Currently, yajl has undefined behaviour if something starts with #0 but is not
an array and applications have problems, too. But thats basically only another
reason why configuration files must be validated. Users always can write garbage
in configuration files, thats a fundamental problem unrelated to arrays.
Well, I rely on elektra to take care of that sort of issues :-)
Post by Markus Raab
Post by Kai-Uwe Behrmann
Btw. ksAppendArray(KeySet*, Key *); would help only in a case, where a
array contains single keys. That is not of much help for both of Linda
and Bob above.
It should be generic and always help. But maybe I still did not understand ;)
Can come up with a monitor/keyboard example which makes more clear if k1, k2
is intended to be a subkey below an array index or as key to be added as array
index.
The former, many keys below one array index, fits at least my use case.

br
Markus Raab
2015-02-24 16:23:06 UTC
Permalink
Hello,
Post by Kai-Uwe Behrmann
That is implemented.
ksCut+ksSize works of course, too.
Post by Kai-Uwe Behrmann
Post by Markus Raab
When software A is purged (i.e. its configuration should be removed)
there is no obvious way to detect which entries were added during
installation (especially if the administrator alters parts of it). We
could do reference counting in meta data, but I am not really happy with
that.
A read time stamp, alias atime, would help. But that might be expensive
depending how it is implemented.
The question to be answered is if any application that uses the configuration
is installed. So atime does not really help: purging also removes
configuration that was looked at; even modified configuration should be
removed when software is purged.
Post by Kai-Uwe Behrmann
Post by Markus Raab
Another problem is if something is expected to be an array but in fact is
not. Currently, yajl has undefined behaviour if something starts with
#0 but is not an array and applications have problems, too. But thats
basically only another reason why configuration files must be validated.
Users always can write garbage in configuration files, thats a
fundamental problem unrelated to arrays.
Well, I rely on elektra to take care of that sort of issues :-)
That is our plan!

best regards,
--
Markus Raab http://www.libelektra.org
Technische Universität Wien ***@markus-raab.org
Institut für Computersprachen Phone: (+431) 58801/185185
Argentinierstr. 8, 1040 Wien, Austria FAX: (+431) 58801/18598

DVR 0005886
Markus Raab
2015-02-11 17:45:36 UTC
Permalink
Hello List
Post by Kai-Uwe Behrmann
So ksLookupByName() should be sufficient to search for the array length
by searching for
user/array/#0
user/array/#1
user/array/#2
user/array/#3 // will fail, so we have 3 entries in user/array
Yes, exactly!
Post by Kai-Uwe Behrmann
Post by Markus Raab
user/array = 3
user/array/#0 = some value
user/array/#1 = other value
user/array/#2 = last entry
What do you think?
/user/array = #3
That would make the method with ksLookupByName() obsolete for array size
detection.
Good idea. Its easy to skip # and _, but its harder to generate them.

We need, however, to consider if its worth to pay the overhead in array
appending. I opened git hub issue #182

best regards,
--
Markus Raab http://www.libelektra.org
Technische Universität Wien ***@markus-raab.org
Institut für Computersprachen Phone: (+431) 58801/185185
Argentinierstr. 8, 1040 Wien, Austria FAX: (+431) 58801/18598

DVR 0005886
Continue reading on narkive:
Loading...