I have a bit of a thorny JSON manipulation problem. I have half a mind to just write a Python program to do it, but I’m wondering if a well-written jq
query can solve it more elegantly — partly for a cleaner solution, and partly for pedagogic purposes. (I’m a jq
noob and would love to take this opportunity to learn.)
I have the following JSON, printed from a tool whose output format I cannot modify:
[ {"ExifTool:ExifTool:ExifTool": {"ExifToolVersion": 12.76 },"SourceFile": "./_DSC5848.JPG","File:System:Other": {"FileName": "_DSC5848.JPG","Directory": ".","FileSize": "82 kB","FilePermissions": "-rw-r--r--" },"EXIF:ExifIFD:Camera": {"ExposureProgram": "Aperture-priority AE","MaxApertureValue": 1.4,"Sharpness": "Normal" },"File:System:Time": {"FileModifyDate": "2024:09:24 14:10:16-07:00","FileAccessDate": "2024:09:28 00:13:26-07:00","FileInodeChangeDate": "2024:09:25 23:26:20-07:00" },"EXIF:ExifIFD:Image": {"ExposureTime": "1/50","FNumber": 4.0,"ISO": 200 }, ... additional arbitrary colon-keys ... }, { ... }, { ... }, { ... }, { ... }]
I need the keys containing colons (I’ll call them “colon-keys”) to be recursively “unrolled” such that "A:B:C": { ... }
becomes:
"A": {"B": {"C": { ... } }}
Colon-keys with identical prefixes would be merged. For example, if there is also a colon-key "A:B:D": { ... }
, the above would become:
"A": {"B": {"C": { ... },"D": { ... } }}
Preserving the order of keys isn’t crucial, though it’d be cool if possible. It’s not known in advance what the names of the colon-keys will be, so hard-coding them unfortunately isn’t an option.
So to circle back to the example from the beginning of this post, the output would look like:
[ {"ExifTool": {"ExifTool": {"ExifTool": {"ExifToolVersion": 12.76 } } },"SourceFile": "./_DSC5848.JPG","File": {"System": {"Other": {"FileName": "_DSC5848.JPG","Directory": ".","FileSize": "82 kB","FilePermissions": "-rw-r--r--" },"Time": {"FileModifyDate": "2024:09:24 14:10:16-07:00","FileAccessDate": "2024:09:28 00:13:26-07:00","FileInodeChangeDate": "2024:09:25 23:26:20-07:00" } } },"EXIF": {"ExifIFD": {"Camera": {"ExposureProgram": "Aperture-priority AE","MaxApertureValue": 1.4,"Sharpness": "Normal" },"Image": {"ExposureTime": "1/50","FNumber": 4.0,"ISO": 200 } } } }, { ... }, { ... }, { ... }, { ... }]
Is this possible to do with a well-written jq
query, or is my only option a hand-rolled program?
Bonus, would such a query be able to handle colon-keys of arbitrary length (A:B
, A:B:C
, A:B:C:D
, etc.) and at arbitrary levels of the JSON ("A:B:C": { "D:E": { ... } }
)?