I have a json like
[ {"url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link","title": "– Flexibility" }, {"url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra","title": "– Pronouns" }]I got it using curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.'.
I have a command in my machine named unescape_html, a python scipt to unescape the html (replace – with appropriate character).
How can I apply this function on each of the titles using jq.
For example:
I want to run:
unescape_html "– Flexibility"unescape_html "– Pronouns"The expected output is:
[ {"url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link","title": "– Flexibility" }, {"url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra","title": "– Pronouns" }]Update 1:
If jq doesn't have that feature, then i am also fine that the command is applied on rg.
I mean, can i run unescape_html on $2 on the line:
rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}'Any other bash approach to solve this problem is also fine. The point is, i need to run unescape_html on title, so that i get the expected output.
Update 2:
The following command:
curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq 'map(.title |= @sh "unescape_html \(.)")'gives:
[ {"url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link","title": "unescape_html '– Flexibility'" }, {"url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra","title": "unescape_html '– Pronouns'" }]Just not evaluating the commands.
The following command works:
curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq 'map(.title |= sub("–"; "–"))'But it only works for –. It will not work for other reserved characters.
Update 3:
The following command:
curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq -r '.[] | "unescape_html \(.title | @sh)"' | bashis giving:
– Flexibility– PronounsSo, it is applying the bash function. Now I need to format is so that the urls are come with the title in json format.