Story #12960
closed[API] Migrate remaining "properties" fields to jsonb
Added by Peter Amstutz almost 7 years ago. Updated over 6 years ago.
100%
Description
Migrate remaining "properties" fields to jsonb in order to support subproperty filters.
Tables:
- container_requests
- links
- nodes
We should also consider adding properties to these records types that don't currently have it:
- groups
- users
Don't migrate legacy tables:
- logs
- pipeline_instances
- humans
- specimens
- traits
Updated by Peter Amstutz almost 7 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz almost 7 years ago
- Status changed from In Progress to New
Updated by Peter Amstutz almost 7 years ago
- Subject changed from Migrate remaining "properties" fields to jsonb to [API] Migrate remaining "properties" fields to jsonb
Updated by Peter Amstutz over 6 years ago
Question about migrating other serialized fields such as mounts, etc → we don't plan on querying on it, the subproperty query API exposed by Arvados doesn't support deep queries. Are there other benefits to migrating serialized columns to jsonb, for example, is jsonb more compact or faster to retrieve in the database?
Updated by Lucas Di Pentima over 6 years ago
- Faster writes (no conversion process involved)
- Input text is saved as is, so things like key ordering and indentation are preserved
- Duplicated attributes get deduplicated & whitespaces removed (potential storage space savings)
- Significantly faster processing
- Indexable / searchable
From what I could read on docs and forum comments, the general rule for deciding whether to use JSON over JSONB is that if data will be processed on the application itself, JSON would be fine, otherwise it is probably more convenient to use JSONB.
I’ve found mixed references about which type is more compact/faster to retrieve, no clear answer to that.
Updated by Lucas Di Pentima over 6 years ago
- Related to Feature #13562: [API] Add JSONB properties field on groups added
Updated by Tom Morris over 6 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points set to 1.0
Updated by Peter Amstutz over 6 years ago
I should mention that indexing properties on container_requests is necessary to fully implement the Workflow Execution Service (WES) spec, which includes filtering on key-value tags.
Updated by Tom Morris over 6 years ago
- Target version changed from Arvados Future Sprints to 2018-06-20 Sprint
Updated by Peter Amstutz over 6 years ago
12960-migrate-properties @ a518eb2b3b743259fe51b1e61d2648677e583cac
Convert links.properties and container_requests.properties, nodes.properties, and nodes.info to jsonb and add gin index.
Updated by Peter Amstutz over 6 years ago
- Status changed from New to In Progress
Updated by Tom Clegg over 6 years ago
LGTM.
Strangely, the db:migrate output is verbose about the container_requests.properties iteration but silent about the others.
== 20180607175050 PropertiesToJsonb: migrating ================================ -- execute("DROP INDEX IF EXISTS container_requests_full_text_search_idx") -> 0.0004s -- execute("CREATE INDEX container_requests_full_text_search_idx ON container_requests USING gin(to_tsvector('english', coalesce(uuid,'') || ' ' || coalesce(owner_uuid,'') || ' ' || coalesce(modified_by_client_uuid,'') || ' ' || coalesce(modified_by_user_uuid,'') || ' ' || coalesce(name,'') || ' ' || coalesce(description,'') || ' ' || coalesce(properties::text,'') || ' ' || coalesce(state,'') || ' ' || coalesce(requesting_container_uuid,'') || ' ' || coalesce(container_uuid,'') || ' ' || coalesce(runtime_constraints::text,'') || ' ' || coalesce(container_image,'') || ' ' || coalesce(environment::text,'') || ' ' || coalesce(cwd,'') || ' ' || coalesce(command::text,'') || ' ' || coalesce(output_path,'') || ' ' || coalesce(filters,'') || ' ' || coalesce(scheduling_parameters::text,'') || ' ' || coalesce(output_uuid,'') || ' ' || coalesce(log_uuid,'') || ' ' || coalesce(output_name,'')))") -> 0.0083s == 20180607175050 PropertiesToJsonb: migrated (0.3391s) =======================
...but test.log shows all of the changes, so I guess it's ok.
Migrating to PropertiesToJsonb (20180607175050) (0.2ms) BEGIN (0.3ms) DROP INDEX IF EXISTS nodes_full_text_search_idx (55.3ms) ALTER TABLE nodes ALTER COLUMN properties TYPE jsonb USING properties::jsonb (0.5ms) CREATE INDEX nodes_index_on_properties ON nodes USING gin (properties) (0.1ms) DROP INDEX IF EXISTS nodes_full_text_search_idx (390.1ms) ALTER TABLE nodes ALTER COLUMN info TYPE jsonb USING info::jsonb (0.4ms) CREATE INDEX nodes_index_on_info ON nodes USING gin (info) (0.3ms) DROP INDEX IF EXISTS container_requests_full_text_search_idx (61.6ms) ALTER TABLE container_requests ALTER COLUMN properties TYPE jsonb USING properties::jsonb (0.4ms) CREATE INDEX container_requests_index_on_properties ON container_requests USING gin (properties) (0.1ms) DROP INDEX IF EXISTS links_full_text_search_idx (91.3ms) ALTER TABLE links ALTER COLUMN properties TYPE jsonb USING properties::jsonb (0.4ms) CREATE INDEX links_index_on_properties ON links USING gin (properties) (0.2ms) DROP INDEX IF EXISTS container_requests_full_text_search_idx (1.5ms) CREATE INDEX container_requests_full_text_search_idx ON container_requests USING gin(to_tsvector('english', coalesce(uuid,'') || ' ' || coalesce(owner_uuid,'') || ' ' || coalesce(modified_by_client_uuid,'') || ' ' || coalesce(modified_by_user_uuid,'') || ' ' || coalesce(name,'') || ' ' || coalesce(description,'') || ' ' || coalesce(properties::text,'') || ' ' || coalesce(state,'') || ' ' || coalesce(requesting_container_uuid,'') || ' ' || coalesce(container_uuid,'') || ' ' || coalesce(runtime_constraints::text,'') || ' ' || coalesce(container_image,'') || ' ' || coalesce(environment::text,'') || ' ' || coalesce(cwd,'') || ' ' || coalesce(command::text,'') || ' ' || coalesce(output_path,'') || ' ' || coalesce(filters,'') || ' ' || coalesce(scheduling_parameters::text,'') || ' ' || coalesce(output_uuid,'') || ' ' || coalesce(log_uuid,'') || ' ' || coalesce(output_name,''))) SQL (0.2ms) INSERT INTO "schema_migrations" ("version") VALUES ($1) [["version", "20180607175050"]] (3.4ms) COMMIT
Also noticed it's impossible to db:migrate a test db, because test fixtures are still being inserted as YAML. But db:drop + db:setup + db:fixtures:load works ok.
Updated by Peter Amstutz over 6 years ago
Due diligence: https://ci.curoverse.com/job/developer-run-tests/744/
Updated by Peter Amstutz over 6 years ago
It is also not logging ALTER TABLE. But I agree it seems to be an ActiveRecord quirk rather than an actual problem. I will merge if Jenkins is happy.
Updated by Peter Amstutz over 6 years ago
Oops, jenkins is not happy. Trivial different in serialization from jsonb failed a test. Running again @ cefddd8efdc88c3d33e5aad4c02e96c64cbe8242
Updated by Peter Amstutz over 6 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|1e03ea4765bee6db9331a1265c8abf2c3e45caff.