Edit - I was not clear. My apologies.
My customer is producing enormous json-like files, using automation. For this reason they can be enormous; tens of gigabytes or more; I cannot control these files in size or in content.
The files aren't valid json; they tend to be sequential json records without separators between them. They look sort of like this:
{ "a":1, "b": 2, ... }{ "a":2, "b": 4, ... }{ "a":3, "b": 6, ... }
Our software runs at customer site, autonomously, without my team being present after the initial setup.
Customers have many files, and I have many customers. Custom coding is a last resort.
I have
jq
in my environment. I would prefer to use what I already have.
Given the above set-up, I fear jq -s
will load entire multi-gigabyte files into memory.
I need to convert the above semi-json into something valid like:
[{ 'a':1, 'b': 2, ... },{ 'a':2, 'b': 4, ... },{ 'a':3, 'b': 6, ... }]
and I would like to stream the json while I make this conversion to reduce resource consumption.
Using jq --slurp "."
, the files are converted to the desired array-of-records. However slurp
pulls the entire file into memory and that's not ok.
Using jq
what's an alternative "streaming" method?