Story #4019
closed[API] Support query of "properties" field on objects
Added by Peter Amstutz about 10 years ago. Updated about 7 years ago.
100%
Description
A major drawback of using "link" objects for metadata (as opposed to the "properties" field on the object itself) is that link objects have permissions separate from the item they are describing. While sometimes this is desirable, in other situations (such as associating repo/tag to docker images) this ends introducing significant unnecessary complexity. The ability to query the "properties" field enables us to store metadata directly on the object and search on it.
Upgrade apiserver to Postgres 9.4 and Rails 4.2 (when it is released) to take advantage of searchable jsonb column data type.
Updated by Peter Amstutz about 10 years ago
- Subject changed from Upgrade apiserver to Postgres 9.4 and Rails 4.2 (when it is released) to take advantage of queriable jsonb column data type. to Support query of "properties" field on objects
- Description updated (diff)
- Target version set to Arvados Future Sprints
Updated by Peter Amstutz almost 10 years ago
- Subject changed from Support query of "properties" field on objects to [API] Support query of "properties" field on objects
- Category set to API
- Story points set to 5.0
Updated by Peter Amstutz over 7 years ago
https://www.postgresql.org/docs/9.4/static/datatype-json.html
jsonb also supports indexing
In general, most applications should prefer to store JSON data as jsonb, unless there are quite specialized needs, such as legacy assumptions about ordering of object keys.
JSON "null" is allowed, but not the same as the SQL NULL.
JSON data is written out as a string with tagged with ::json or ::jsonb
SELECT '[1, 2, "foo", null]'::json;
Testing containment is an important capability of jsonb. There is no parallel set of facilities for the json type. Containment tests whether one jsonb document has contained within it another one.
The default GIN operator class for jsonb supports queries with the
@>
, ?, ?& and ?| operators.
The non-default GIN operator class jsonb_path_ops supports indexing the@>
operator only.
https://www.postgresql.org/docs/9.4/static/functions-json.html
peterjsontest=# select * from jsonbtest; j1 ---------------- "foo" {"foo": "bar"} null true h2. (4 rows)
The ->
operator is the basic indexing operator:
select * from jsonbtest where j1->'foo' = '"bar"'::jsonb j1 ---------------- {"foo": "bar"} (1 row)
The ->>
operator is also an indexing operator, but returns the value as a text type which is directly comparable to SQL text values:
select * from jsonbtest where j1->>'foo' = 'bar';
The #>
and #>>
operators are similar but follow a path:
select * from jsonbtest where j1#>'{"foo", 1}' = '4'::jsonb; j1 -------------------- {"foo": [2, 4, 6]} (1 row)
The ? operator determines if a key or value is present.
peterjsontest=# select * from jsonbtest where j1 ? 'foo'; j1 -------------------- "foo" {"foo": "bar"} {"foo": [2, 4, 6]} {"foo": ["3", "5", "7"]} (3 rows)
Operators can be chained together.
select * from jsonbtest where j1->'foo' ? '3'; j1 -------------------------- {"foo": ["3", "5", "7"]} (1 row)
The ? operator only works for strings:
select * from jsonbtest where j1->'foo' ? '2'; j1 ---- (0 rows)
Note: the following query won't use the index (except maybe to filter on rows which have 'foo' as a key, but that's unhelpful if 'foo' appears the majority of rows). It requires a special expression index:
select * from jsonbtest where j1->'foo' ? '3'; CREATE INDEX idxgintags ON jsonbtest USING gin ((j1 -> 'foo'));
We can do pattern matching on values:
select * from jsonbtest where j1->>'foo' like 'b%'; j1 ---------------- {"foo": "bar"} (1 row)
The @>
operator is the "containment" search operator, which allows testing for presence of key-value pairs directly. This produces the same answer as select * from jsonbtest where j1->'foo' = '"bar"'::jsonb;
but @>
is able to use an index:
select * from jsonbtest where j1 @> '{"foo": "bar"}'::jsonb; j1 ---------------- {"foo": "bar"} (1 row)
We can use containment search with numbers:
select * from jsonbtest where j1 @> '{"foo": [2]}'::jsonb j1 -------------------- {"foo": [2, 4, 6]} (1 row)
Updated by Peter Amstutz over 7 years ago
Blog post on using jsonb with RoR: http://nandovieira.com/using-postgresql-and-jsonb-with-ruby-on-rails
Updated by Peter Amstutz over 7 years ago
Design sketch for extending filters.
Syntax for filters.
Extend the 1st portion of the query to support operations on embedded json fields.
[["properties", "foo"], "=", "bar"]
(other options considered: "properties.foo", "properties->foo", "properties[foo]" but these alls require parsing and raise the issue of special characters ".->[]" appearing in the key).
Straight key-value tests can be efficiently implemented using @>
operator on the index:
select * from table where properties @>
'{"foo": "bar"}'::jsonb
Other possible queries:
[["properties", "foo"], "like", "b%"]
Implemented as:
select * from table where properties ? 'foo' and properties->>'foo' like 'b%'
(note that properties->>'foo'
evaluates to SQL NULL for rows that don't have a "foo" entry; using properties ? 'foo'
uses the index to filter rows that don't have a "foo" entry).
Somewhat confusingly, although the documentation describes correspondences between JSON types and Postgres types, they are not directly interoperable and comparisons require coercion to jsonb types:
select * from jsonbtest where properties ? 'foo' and j1->'foo' <> 'null'::jsonb select * from jsonbtest where properties ? 'foo' and j1->'foo' > '3'::jsonb select * from jsonbtest where properties ? 'foo' and j1->'foo' < '3'::jsonb
"in" queries are probably best expanded to an OR expression:
[["properties", "foo"], "in", ["bar", "quux"]]
select * from table where properties > '{"foo": "bar"}'::jsonb or properties
> '{"foo": "quux"}'::jsonb;
Should "not_in" queries include rows where the key isn't defined?
[["properties", "foo"], "not_in", ["bar", "quux"]]
select * from table where properties ? 'foo' and properties->>'foo' not in ('bar', 'quux');
This raises a question: if the "foo" key is missing, that is an SQL NULL, but if it is present, it can have the value of JSON null. How should the filtering handling this? Maybe expose the '?' operator for JSONB columns to allow existence queries?
Updated by Peter Amstutz over 7 years ago
The minimal operator set I recommend is:
- equality matching: [["properties", "foo"], "=", "bar"]
- like-matching: [["properties", "foo"], "like", "b%"]
- like-matching to get all records with a given key: [["properties", "foo"], "like", "%"]
Updated by Tom Clegg over 7 years ago
Would it be possible to use arrays in serialized fields? Say we had properties like this
{
"foo": [
{"bar": "baz"},
{"qux": "quux"},
"corge"
]
}
[How] would the following queries work?
- properties.foo[anything].bar
==
"baz" - properties.foo[anything].bar
==
anything - properties.foo[anything]
==
"corge"
Would it be possible to distinguish {"foo":["bar"]} from {"foo":{"bar":"bar"}} and {"foo":"bar"}?
Updated by Peter Amstutz over 7 years ago
Tom Clegg wrote:
Would it be possible to use arrays in serialized fields? Say we had properties like this
[...]
[How] would the following queries work?
- properties.foo[anything].bar "baz"
- properties.foo[anything].bar anything
- properties.foo[anything] == "corge"
Since you just made up a syntax, I don't know what these queries are supposed to do?
Would it be possible to distinguish {"foo":["bar"]} from {"foo":{"bar":"bar"}} and {"foo":"bar"}?
It depends?
Updated by Tom Clegg over 7 years ago
Peter Amstutz wrote:
Since you just made up a syntax, I don't know what these queries are supposed to do?
Match the given example based on the contents. For example, when I said
properties.foo[anything].bar == "baz"
I meant "get every record whose properties hash has a key "foo" whose value is an array containing an object with a key "bar" whose value is "baz"."
Would it be possible to distinguish {"foo":["bar"]} from {"foo":{"bar":"bar"}} and {"foo":"bar"}?
It depends?
Depends on..?
Updated by Peter Amstutz over 7 years ago
Here's how I think the 1st and 3rd would be queried in Postgres:
select * from table where properties @>
'{"foo": [{"bar": "baz"}]}'::jsonb
select * from table where properties @>
'{"foo": ["corge"]}'::jsonb
The 2nd one is kind of hard. I eventually got to this:
select j1 from jsonbtest where j1 @>
'{"foo": [{}]}'::jsonb and exists (select value from jsonb_array_elements(j1->'foo') where value @>
'{}'::jsonb and value ? 'bar');
This filters on values of "foo" that consist of arrays containing objects, and then constructs a subquery on the elements in the array and tests if each element is an object, and it contains the key 'bar'.
Updated by Peter Amstutz over 7 years ago
Basically, although you can use [] to match any array and {} to match any object, the postgres @>
operator lacks a scalar wildcard to match strings/numbers/booleans/null.
Also annoying, this is true:
SELECT '["foo", "bar"]'::jsonb @> '"foo"'::jsonb;
But this doesn't work:
SELECT '{"foo": "bar"}'::jsonb @> '"foo"'::jsonb;
Even though these are both true:
SELECT '["foo", "bar"]'::jsonb ? 'foo';
SELECT '{"foo": "bar"}'::jsonb ? 'foo';
Updated by Peter Amstutz over 7 years ago
Searching and filtering operations:
Get records with {"mytag1": "myvalue1"}:
[["properties", "tags", "mytag1"], "=", "myvalue1"]]
Get records with the value of "mytag1" matching a prefix:
[["properties", "tags", "mytag1"], "like", "myval%"]]
Get records with any value of "mytag1":
[["properties", "tags", "mytag1"], "like", "%"]]
Updated by Peter Amstutz over 7 years ago
Implementations:
select properties from table where properties @>
'{"tags": {"mytag1": "myvalue1"}}'::jsonb
select properties from table where properties @> '{"tags": {}}'::jsonb and properties#>>'{"tags","mytag1"}' like 'myval%'
select properties from table where properties @> '{"tags": {}}'::jsonb and properties->'tags' ? 'mytag1'
Updated by Peter Amstutz over 7 years ago
Note: the above assumes 'tags' in a subfield. But it is easier to write queries that utilize the index if tags are in a toplevel object. (If there is a reason to have separate "user visible" and "not user visible" tags, then we should have a separate "tags" column).
Get records with {"mytag1": "myvalue1"}:
[["properties", "mytag1"], "=", "myvalue1"]]
select properties from table where properties @> '{"mytag1": "myvalue1"}'::jsonb
Get records with the value of "mytag1" matching a prefix:
[["properties", "tags", "mytag1"], "like", "myval%"]]
select properties from table where properties ? 'mytag1' and properties->>'mytag1' like 'myval%'
Get records with any value of "mytag1":
[["properties", "mytag1"], "like", "%"]]
select properties from table where properties ? 'mytag1'
Updated by Tom Morris about 7 years ago
- Target version changed from Arvados Future Sprints to To Be Groomed
Updated by Peter Amstutz about 7 years ago
Test for exact key-value:
["properties.mytag1", "=", "myvalue1"]
Test for existence of 'mytag1', any value, which ever is easier:
["properties", "?", "mytag1"]
["properties.mytag1", "?", ""]
Updated by Tom Morris about 7 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points changed from 5.0 to 2.0
Updated by Tom Morris about 7 years ago
- Target version changed from Arvados Future Sprints to 2017-12-20 Sprint
Updated by Peter Amstutz about 7 years ago
- Related to deleted (Story #11908: Migrate Collections.properties to JSONB)
Updated by Peter Amstutz about 7 years ago
- Blocked by Story #11908: Migrate Collections.properties to JSONB added
Updated by Peter Amstutz about 7 years ago
4019-query-properties @ 79a2d819d596e610f26c08beee53f5432bfbb360
Supports almost all operators on jsonb subproperties: =, !=, <, <=, >, >=, like, ilike, in, not in. Also adds a new operator "exists".
Documentation updated.
Adds a new index on collection properties.
Updated by Lucas Di Pentima about 7 years ago
- File
services/api/lib/record_filters.rb
- Line 112: Wouldn’t be convenient to validate the operand type to only allow boolean values when filtering subproperties with
'exists'
? I found confusing the use of operands like‘’
,‘none’
,‘false’
and0
with‘exists’
operator and getting results like if I used an operand == true. - If the above comment is valid, maybe adding some tests on
‘exists’
with invalid operand types will be useful.
- Line 112: Wouldn’t be convenient to validate the operand type to only allow boolean values when filtering subproperties with
- File
services/api/test/unit/arvados_model_test.rb
- Line 150: Is the addition on this line superfluous as
:jsonb
was removed fromsearchable_columns()
?
- Line 150: Is the addition on this line superfluous as
Updated by Peter Amstutz about 7 years ago
Lucas Di Pentima wrote:
- File
services/api/lib/record_filters.rb
- Line 112: Wouldn’t be convenient to validate the operand type to only allow boolean values when filtering subproperties with
'exists'
? I found confusing the use of operands like‘’
,‘none’
,‘false’
and0
with‘exists’
operator and getting results like if I used an operand == true.- If the above comment is valid, maybe adding some tests on
‘exists’
with invalid operand types will be useful.
Good idea. Improved error checking & added some tests for error cases.
- File
services/api/test/unit/arvados_model_test.rb
- Line 150: Is the addition on this line superfluous as
:jsonb
was removed fromsearchable_columns()
?
That's right, reverted.
now 4019-query-properties @ a879823f631381cefc4458c28f06c36803e30530
Updated by Anonymous about 7 years ago
- Status changed from New to Resolved
Applied in changeset arvados|e768a05df9fd75cee3724e6b68cb65beeebaaa38.