I am querying ElasticSearch and sorting the documents locally in Bash with jq
, as sorting in ES is too slow for me.
The original purpose is to create a CSV file.
But I find the sorting does not work properly, it seems sort
step does nothing.
As I am launching cURL
requests, I thought the wrong order is due to content is chunked so I save some results into a local test.json
file and tried again, but it still does not work.
test.json
:
{"took": 680,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0 },"hits": {"max_score": 1.0,"hits": [ {"_index": "my-index","_type": "_doc","_id": "111111113584925","_score": 1.0,"fields": {"field2": ["FOO" ],"field1": ["111111113584925" ] } }, {"_index": "my-index","_type": "_doc","_id": "111111121254059","_score": 1.0,"fields": {"field2": ["FOO" ],"field1": ["111111121254059" ] } } ] }}
(There are many more records - edited for brevity.)
Command that I use:
jq '.hits.hits[].fields | [.field1[0] +"," + .field2[0]] | sort | .[0]' -r test.json
The result:
111111113584925,FOO111111121254059,FOO111111116879444,FOO
etc.
Why?
Should I rely on jq
sorting? Am I using it correctly? I mean I want to do string comparison by alphabetical order, and field1
all have unique values, so it will never be a tie and start to compare values of field2
(it also could have various values but I only want to sort by field1
)
Should I use Bash sort -k 1
instead? Which is faster when it comes to 100K rows?