3. JSON in RDBMS
Why
• Consistency (integrity, transaction ACIDity) for storing JSON documents
• Denormalization of complex objects
What
• Logs with responses/requests received/sent during software interaction
• Configuration data, key/value user preferences
• Unstructured or semi-structured complex objects
and please forget about any analytics by JSON fields
4. DB configuration
• Install JSON fixes on regular base
• Retain scripts to check old issues – new fixes restores them often
• These patches MUST be installed
1. Patch 20080249: JSON Patch Bundle 1
2. Patch 20885778: JSON Patch Bundle 2
3. Patch 24836374: JSON Patch Bundle 3
Oracle thinks JSON is stable now so no more dedicated JSON Bundle patches!
Fixes are inside ordinal database proactive bundle patches (Doc ID 1937782.1).
6. Table structure
BLOB for JSON benefits:
• Twice less space consumption
• less I/O due less space
• No implicit character-set conversion if the database
character set is not AL32UTF8
RFC 4627
14. Retrieval
• Extract 1 row with raw JSON data and pass it to application server as is
• Issue SQL statement which extracts 1 row from table and parses it via Oracle
JSON feature
• Create a view which encapsulates JSON treatment and extract a row from the view
22. Retrieval
Views
1. Often become non mergable so performance degrades
2. If 2 or more json_table are used exception doesn’t occur but results could be
wrong in aggregate functions. no_merge hint helps sometimes.
3. ORA-600 and ORA-7445 No Data to be read from socket arise in arbitrary
places
4. Count(distinct <field>) fails with ORA-7445 No Data to be read from socket.
Could be fixed by removing group by, adding row_number() over (partition by
<group by fields>) rn and filtering out records where rn <> 1
47. Maintenance
Pos/prefix columns with JSON data via _JSON like INVOICE_JSON before.
• Create daily checks
1. If you need control JSON format (strict/lax) use dba_tab_columns and
all_json_columns views to check JSON constrains
2. If you need insert performance check dba_lobs to check cache attribute
• Check CONTEXT indexes are in proper state
48. Maintenance
Provide regular indexes optimization
1. Collect fragmented indexes (estimated row fragmentation)
2. Collect indexes with many deleted rows (estimated garbage size)
3. Run ctx_ddl.optimize_index in FULL mode (SERIAL or PARALLEL)
49. Conclusion
• JSON is always tradeoff between performance/data treatment convenience/integrity
• Indexing strategy should be checked very careful you use 2 notations especially
• JSON treatment is acceptable in row-per-row scenario
• JSON features are still non-stable
• Oracle fails with JSON more 2 Mb very often
• Current implementation doesn’t look like “document-stored” DB
• Tailored search solutions bring better performance
and we are waiting Oracle 12.2
One of our projects uses JSON extensively so we have a lot of stuff to share with you . We found quite a few JSON caveats you whether may or not face with in Oracle 12.1. We will cover the best part of issues you could face with starting from a decision to use JSON in Oracle to daily maintenance and will try to advice how to process JSON efficiently. Please pay attention some advices will be useless in Oracle 12.2 especially related to search.
We will try to go through full JSON field lifecycle. Please pay attention some of principles being mentioned will not be connected to JSON but without them good DB performance will be impossible. I divided retrieval and search which look similar but the difference will be clear further. Sometimes we will return to plan point because of additional workarounds.
Let’s discuss what for do we need JSON in RDBMS
These objects is something like tags, questionnaires, documents with many levels of nesting which requires huge number of relational tables. Key idea – avoid extra joins.
Please don’t forget about Support – check latest JSON fixes.
Without installed patches there a tons of errors: ora-600, no read data from the socket, 2 json_tables don’t work in different session, FORMAT JSON should be mentioned in case of BLOB fields.
Let’s decide how we would like to store our JSON. Which table is better?
DB encoding is Oracle-recommended AL32UTF8
Let’s add 10000 records in each table and check space consumption.
Universal Coded Character Set - ucs
So final structure recommended by Oracle is
We created a table with BLOB, put but JAVA side failed with number conversion.
Let’s add is json constraint to ensure no wrong data in our table.
Let’s check in any online JSON validator
Why Oracle passes invalid JSON? Is it actually invalid?
50/50
Oracle default is Lax. If you need to be completely sure for any JSON parser use STRICT.
If we use . Notation it fails so have to rewrite all json which leads to huge I/O in case of JSON + sophisticated PL/SQL processing in case of JSON more 4000 bytes. If we need update JSON we create small Java utilities which run in our CI tool. It works faster that sql/plsql in Java bulk processing especially.
We created perfect looking json table structure and started to ingest our data.
Let’s measure just time, without context switches and e.t.c. otherwise we have no time to check all stuff during the presentation. We will use rather simple script.
It is slow and let’s omit BULK. Even if we add it the performance is low. At least I state it is low. What the reason? How to defeat it?
Let’s remove our constraints we just created to see which one contributes more
We have decent speed but it is still an issue.
Is it possible to get more? Any ideas? Without bulk, pure sql and obvious stuff?
Let me give you a glue
What Oracle documentation says about it?
So ingestion is twice faster with CACHE and without constraints
Frankly speaking we don’t switch off constraint but having known our data we use CACHE for best part of our BLOB intended for JSON.
I hope you all have tests of your database structure so add something like it there and discuss with your developers why they didn’t set cache for the particular BLOB
Retrieval – extract information from JSON
There are many options.
The first way is common way before Oracle 12 and it works fine. Let’s consider others 2.
The simplest option – use . Notation. Let’s check.
Doesn’t work. Let’s add an alias.
Doesn’t work as well.
Let’s add a check and it works now
Let’s try to work with json_value and complex structures
.notation doesn’t work with arrays in Oracle 12.1 but does work in Oracle 12.2
JSON table looks very promising. Single parsing, a little verbose syntax, arrays work perfect so sounds cool! Isn’t it?
Let’s have a look into other example which seems very straighforward
We have 2 json_table which takes a lot of effort from CPU
We got the result but what’s the price – 2 parse and unreadable sql!
Let’s make it simpler – json_table works for the same line so could be used in a such manner
We have 2 json_table which takes a lot of effort from CPU
So it is the proper way and 1 sql call
It is rather hard to show issues with view without showing production data but we collected all issues
For sure if there is no index we have full scans which is bad actually.
Let’s create the index. Plan is the same.
The reason is filter condition
So we have to create indexes completely equal to filter conditions and we have them in the plan
Let’s check .notation – we should ensure the index is used so we see that it is done internally via JSON_QUERY rather than JSON value so we need to create separate set of indexes.
Let’s do it
Let’s run and see what we get.
What do you think?
Exception!!!
Once we add the alias the index works fine
I hope you knows how these indexes work inside – they are virtual columns where extended statistic is gather but it is another story
Nevertheless it is possible to have only 1 index. Once we have it we could use . notation but we make json field more rigid.
Creating indexes we could maintain sort of json schema
There are 2 options for multiple column indexing. The straightforward and not.
Queries are a little bit ugly
Let’s try more appropriate way. As we seen for indexes they work using virtual columns. Let’s use the same approach.
Once we add columns index them as the ordinal ones.
Let’s start the latest case.
Which index will be used?
What should we do? Let’s move to the next slide.
Let’s create an index without dropping others indexes and check all our queries.
Even json table as well as others statements and multi-column queries work properly.
It sounds very good to be in our slides but for sure there are caveats.
How many rows do you expect to see?
Let’s have a look into the plan. The plan has CONTAIN which is Oracle Text operator and _ is a placeholder. Moreover _ is a letter delimiter in default lexert that’s why the second query returns 2 row. There is 2 tokens in reverted index.
So we should instruct Oracle text to search by whole phrase. The behavior changes from patch to patch.
So Oracle full text indexes are rather tricky. There are quite a few of nuances like bitmap index conversion, filter by indexes but it is out of current scope.
Let’s play with indexes and json where there is a lot of nested elements.
We need to find value = ‘640’ in class type ‘Country’. Please pay attention we have 2 of them.
Let’s try to find data by equal queries. All of them return 2 records which looks like or. Oracle textcontains deals with tokens as we know so if it finds a token in a document it is happy. There is no way to fix a a JSON object boundaries.
Let’s try to use JSON table. It works. But could we make the search faster?
And we have a good plan which limits by tokens and next filter by expression.
Moreover we don’t waste time for JSON_TABLE so it is faster from I/O and CPU prospectives
Let’s check how data ingestion works with FTS indexes
Let’s add a new record. How many record will we see? Zero!
Why? JSON indexes are fts so need be refreshed separately. Once we did it we see a record.
How to address the issue? Let’s have a look into the dictionary.
Let’s fix it. We need to issue replace_metadata mentioning REPLACE METADATA once again. Let’s repeat the query and will see new sync type.
I deleted the record with ID 8888888 and repeated insert from previous page. How many records do we get? So the conclusion is if you need to use context indexes after you inserted data you need something else but we discuss it further.
Let’s check the performance and run our well know script with intermediary commit. Execution time is huge comparing first execution (about 2 seconds) with json strict and unique identifiers.
So the question: which command takes the best part of time if we have a look into database logs? No, it will be other which is invoked before each commit implicitly.
Let’s defer index update without inventing a wheel like hand-made job procedures.
If we have a look in details it is done by job but the profit is when you drop the index it will be dropped automatically + refresh metadata are in the index itself. One place. So we run the script, it works slowly but definitely better even in conjunction with refresh time.
But what to do if we need an access indexed data transactionaly? Oracle does provide something for it. Let’s test. In order to do it drop the index and don’t forget FORCE option otherwise Oracle fails and could live orphan CTXSYS dictionary records.
So let’s check how TRANSACTIONAL works. It takes information from Oracle Text staging tables before they come in the reverted index actually but it slow the queries down for sure. Usualy they work 2 times slower when decent volume of data is out of sync. It is instruction how to process CONTAINTS rather than SYNC. We still need it.
Please pay attention – there is no COMMIT inside.
Let’s ensure we get our records via full text search commands without committing them. Let’s commit records. Nothing helps. But Oracle can’t lie.
The conclusion – if you need transactional data access for json data you should invent something by yourself.
Transactional option doesn’t work but we would like to have faster insert. Oracle 12 has new feature which works in terms of internals like TRANSACTIONAL i.e. deals with staging tables but with the new ones. In terms of update of the reverted index it works in async mode. It is name STAGE_ITAB. Let’s have a look in details.
The first command instructs Oracle a new temporary table should be created with $G prefix equal to inverted index.
The next assigns the storage to the new index and creates a job which moves data to the index. What is interesting the job is without a schedule – is invoked taking into account the DB workload.
3rd states data from staging should be moved to reverted index in async mode.
Let’s have a look into the dictionary
Let’s check how it works using our famous statement. It sounds good regarding we could query after commit without additional sync. For sure we still out of transaction context but still…
Context indexes become fragmented. Why? Unfortunately we have no time discussing it but we should have them less fragmented to provide fast search.
Serial optimization makes indexes less fragmented but parallel is faster
For sure it is our experience but inside DataArt we will use these guidelines further
You already seen that proper json reinforced by constraints works slow but without constraints .notation doesn’t work for instance.
The safest JSON treatment – extract one row, use json features as json_table, json_exist and etc
Oracle json isn’t’ still stable even all patches are installed.
----
We try to use Solr or Elastic even if we have delays. Search speed SLA is more strict that SLA to see fresh data
I would like to tell thank you for you time and questions once again. We are going to tell about JSON during DataArt IT talk series so follow us in VK/Facebook and welcome to our team.