Today I have big troubles in parsing huge json files (bigger than a megabyte) with only pure bash functions (performance issues), so I would like to use JQ parser https://jqlang.github.io/jq/ to speed up json parsing.
I'm using some bash functions which help parsing JSON and those functions are highly used in scripts so it will not be possible to change them everywhere but I can upgrade their contents in the library used by all scripts with jq parsing as soon as it produce the exact same output.
But for the moment I did not succeeded in producing the same output with some jq
filters / functions.
Lets see an example, here is a very simple json sample :
{"success":true,"result":[{"duplex":"full","mac_list":[{"mac":"00:00:00:2F:8E:AB","hostname":"firewall-net90d"}],"name":"Ethernet 1","link":"up","id":1,"mode":"1000BaseT-FD","speed":"1000","rrd_id":"1"},{"duplex":"half","name":"Ethernet 2","link":"down","id":2,"mode":"10BaseT-HD","speed":"10","rrd_id":"2"},{"duplex":"half","name":"Ethernet 3","link":"down","id":3,"mode":"10BaseT-HD","speed":"10","rrd_id":"3"},{"duplex":"half","name":"Ethernet 4","link":"down","id":4,"mode":"10BaseT-HD","speed":"10","rrd_id":"4"},{"duplex":"full","name":"NBAplug","link":"up","id":5,"mode":"1000BaseT-FD","speed":"1000","rrd_id":"nbaplug"},{"duplex":"auto","mac_list":[{"mac":"00:00:00:43:B3:73","hostname":"00:00:00:43:B3:73"},{"mac":"00:00:00:1A:66:60","hostname":"00:00:00:1A:66:60"},{"mac":"00:00:00:09:00:13","hostname":"00:00:00:09:00:13"},{"mac":"00:00:00:80:29:7C","hostname":"abcd-PLAYER"},{"mac":"00:00:00:09:00:12","hostname":"firewall-net101f-cluster"},{"mac":"00:00:00:A0:EF:82","hostname":"00:00:00:A0:EF:82"}],"name":"Sfp lan","link":"up","id":9999,"mode":"10000-FD","speed":"10000","rrd_id":"sfp_lan"}]}
Processing this JSON with the BASH function produce this output :
success = trueresult[0].duplex = fullresult[0].mac_list[0].mac = 00:00:00:2F:8E:ABresult[0].mac_list[0].hostname = firewall-net90dresult[0].mac_list[0] = {"mac":00:00:00:2F:8E:AB,"hostname":firewall-net90d}result[0].mac_list = [{"mac":00:00:00:2F:8E:AB,"hostname":firewall-net90d}]result[0].name = Ethernet 1result[0].link = upresult[0].id = 1result[0].mode = 1000BaseT-FDresult[0].speed = 1000result[0].rrd_id = 1result[0] = {"duplex":full,"mac_list":[{"mac":00:00:00:2F:8E:AB,"hostname":firewall-net90d}],"name":Ethernet 1,"link":up,"id":1,"mode":1000BaseT-FD,"speed":1000,"rrd_id":1}result[1].duplex = halfresult[1].name = Ethernet 2result[1].link = downresult[1].id = 2result[1].mode = 10BaseT-HDresult[1].speed = 10result[1].rrd_id = 2result[1] = {"duplex":half,"name":Ethernet 2,"link":down,"id":2,"mode":10BaseT-HD,"speed":10,"rrd_id":2}result[2].duplex = halfresult[2].name = Ethernet 3result[2].link = downresult[2].id = 3result[2].mode = 10BaseT-HDresult[2].speed = 10result[2].rrd_id = 3result[2] = {"duplex":half,"name":Ethernet 3,"link":down,"id":3,"mode":10BaseT-HD,"speed":10,"rrd_id":3}result[3].duplex = halfresult[3].name = Ethernet 4result[3].link = downresult[3].id = 4result[3].mode = 10BaseT-HDresult[3].speed = 10result[3].rrd_id = 4result[3] = {"duplex":half,"name":Ethernet 4,"link":down,"id":4,"mode":10BaseT-HD,"speed":10,"rrd_id":4}result[4].duplex = fullresult[4].name = NBAplugresult[4].link = upresult[4].id = 5result[4].mode = 1000BaseT-FDresult[4].speed = 1000result[4].rrd_id = nbaplugresult[4] = {"duplex":full,"name":NBAplug,"link":up,"id":5,"mode":1000BaseT-FD,"speed":1000,"rrd_id":nbaplug}result[5].duplex = autoresult[5].mac_list[0].mac = 00:00:00:43:B3:73result[5].mac_list[0].hostname = 00:00:00:43:B3:73result[5].mac_list[0] = {"mac":00:00:00:43:B3:73,"hostname":00:00:00:43:B3:73}result[5].mac_list[1].mac = 00:00:00:1A:66:60result[5].mac_list[1].hostname = 00:00:00:1A:66:60result[5].mac_list[1] = {"mac":00:00:00:1A:66:60,"hostname":00:00:00:1A:66:60}result[5].mac_list[2].mac = 00:00:00:09:00:13result[5].mac_list[2].hostname = 00:00:00:09:00:13result[5].mac_list[2] = {"mac":00:00:00:09:00:13,"hostname":00:00:00:09:00:13}result[5].mac_list[3].mac = 00:00:00:80:29:7Cresult[5].mac_list[3].hostname = abcd-PLAYERresult[5].mac_list[3] = {"mac":00:00:00:80:29:7C,"hostname":abcd-PLAYER}result[5].mac_list[4].mac = 00:00:00:09:00:12result[5].mac_list[4].hostname = firewall-net101f-clusterresult[5].mac_list[4] = {"mac":00:00:00:09:00:12,"hostname":firewall-net101f-cluster}result[5].mac_list[5].mac = 00:00:00:A0:EF:82result[5].mac_list[5].hostname = 00:00:00:A0:EF:82result[5].mac_list[5] = {"mac":00:00:00:A0:EF:82,"hostname":00:00:00:A0:EF:82}result[5].mac_list = [{"mac":00:00:00:43:B3:73,"hostname":00:00:00:43:B3:73},{"mac":00:00:00:1A:66:60,"hostname":00:00:00:1A:66:60},{"mac":00:00:00:09:00:13,"hostname":00:00:00:09:00:13},{"mac":00:00:00:80:29:7C,"hostname":abcd-PLAYER},{"mac":00:00:00:09:00:12,"hostname":firewall-net101f-cluster},{"mac":00:00:00:A0:EF:82,"hostname":00:00:00:A0:EF:82}]result[5].name = Sfp lanresult[5].link = upresult[5].id = 9999result[5].mode = 10000-FDresult[5].speed = 10000result[5].rrd_id = sfp_lanresult[5] = {"duplex":auto,"mac_list":[{"mac":00:00:00:43:B3:73,"hostname":00:00:00:43:B3:73},{"mac":00:00:00:1A:66:60,"hostname":00:00:00:1A:66:60},{"mac":00:00:00:09:00:13,"hostname":00:00:00:09:00:13},{"mac":00:00:00:80:29:7C,"hostname":abcd-PLAYER},{"mac":00:00:00:09:00:12,"hostname":firewall-net101f-cluster},{"mac":00:00:00:A0:EF:82,"hostname":00:00:00:A0:EF:82}],"name":Sfp lan,"link":up,"id":9999,"mode":10000-FD,"speed":10000,"rrd_id":sfp_lan}result = [{"duplex":full,"mac_list":[{"mac":00:00:00:2F:8E:AB,"hostname":firewall-net90d}],"name":Ethernet 1,"link":up,"id":1,"mode":1000BaseT-FD,"speed":1000,"rrd_id":1},{"duplex":half,"name":Ethernet 2,"link":down,"id":2,"mode":10BaseT-HD,"speed":10,"rrd_id":2},{"duplex":half,"name":Ethernet 3,"link":down,"id":3,"mode":10BaseT-HD,"speed":10,"rrd_id":3},{"duplex":half,"name":Ethernet 4,"link":down,"id":4,"mode":10BaseT-HD,"speed":10,"rrd_id":4},{"duplex":full,"name":NBAplug,"link":up,"id":5,"mode":1000BaseT-FD,"speed":1000,"rrd_id":nbaplug},{"duplex":auto,"mac_list":[{"mac":00:00:00:43:B3:73,"hostname":00:00:00:43:B3:73},{"mac":00:00:00:1A:66:60,"hostname":00:00:00:1A:66:60},{"mac":00:00:00:09:00:13,"hostname":00:00:00:09:00:13},{"mac":00:00:00:80:29:7C,"hostname":abcd-PLAYER},{"mac":00:00:00:09:00:12,"hostname":firewall-net101f-cluster},{"mac":00:00:00:A0:EF:82,"hostname":00:00:00:A0:EF:82}],"name":Sfp lan,"link":up,"id":9999,"mode":10000-FD,"speed":10000,"rrd_id":sfp_lan}]
And I need to write a jq
dynamic filter (the depth of the json is not predictible) that produce the exact same output in every situations.
For the moment, I did approach the goal with the following filter, but I don't achieve to put the number of the part of each arrays in '[]' nor removing the first dot '.' and adding the '=' and the value for each keys (or json) value for each lines
Here is the jq
filter :
jq -rc '[paths|map(("."+strings)//"[]")|join("")][]'
With this filter, here is the output when processing the previous json :
.result.result[].result[].duplex.result[].mac_list.result[].mac_list[].result[].mac_list[].mac.result[].mac_list[].hostname.result[].name.result[].link.result[].id.result[].mode.result[].speed.result[].rrd_id.result[].result[].duplex.result[].name.result[].link.result[].id.result[].mode.result[].speed.result[].rrd_id.result[].result[].duplex.result[].name.result[].link.result[].id.result[].mode.result[].speed.result[].rrd_id.result[].result[].duplex.result[].name.result[].link.result[].id.result[].mode.result[].speed.result[].rrd_id.result[].result[].duplex.result[].name.result[].link.result[].id.result[].mode.result[].speed.result[].rrd_id.result[].result[].duplex.result[].mac_list.result[].mac_list[].result[].mac_list[].mac.result[].mac_list[].hostname.result[].mac_list[].result[].mac_list[].mac.result[].mac_list[].hostname.result[].mac_list[].result[].mac_list[].mac.result[].mac_list[].hostname.result[].mac_list[].result[].mac_list[].mac.result[].mac_list[].hostname.result[].mac_list[].result[].mac_list[].mac.result[].mac_list[].hostname.result[].mac_list[].result[].mac_list[].mac.result[].mac_list[].hostname.result[].name.result[].link.result[].id.result[].mode.result[].speed.result[].rrd_id
BASH parsing is not only slow, when using JSON file with more then 10^8 char, bash functions failed to parse (after hours of 100% CPU), and I read in JQ documentation that jq
can handle json of more than 1 giga bytes size...
If a 'jq expert' could help here, it would be very appreciated
Kind regardsnbanba