I have a json like
[ {"url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link","title": "– Flexibility" }, {"url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra","title": "– Pronouns" }]
I got it using curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.'
.
I have a command in my machine named unescape_html
, a python scipt to unescape the html (replace – with appropriate character).
How can I apply this function on each of the titles using jq
.
For example:
I want to run:
unescape_html "– Flexibility"unescape_html "– Pronouns"
The expected output is:
[ {"url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link","title": "– Flexibility" }, {"url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra","title": "– Pronouns" }]
Update 1:
If jq
doesn't have that feature, then i am also fine that the command is applied on rg
.
I mean, can i run unescape_html
on $2
on the line:
rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}'
Any other bash approach to solve this problem is also fine. The point is, i need to run unescape_html
on title
, so that i get the expected output.
Update 2:
The following command:
curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq 'map(.title |= @sh "unescape_html \(.)")'
gives:
[ {"url": "https://drive.google.com/file/d/1tO-qVknlH0PLK9CblQsyd568ZiptdKff/view?usp=share_link","title": "unescape_html '– Flexibility'" }, {"url": "https://drive.google.com/open?id=11_sR8X13lmPcvlT3POfMW3044f3wZdra","title": "unescape_html '– Pronouns'" }]
Just not evaluating the commands.
The following command works:
curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq 'map(.title |= sub("–"; "–"))'
But it only works for –
. It will not work for other reserved characters.
Update 3:
The following command:
curl -Lfs "https://motivatedsisters.com/2019/07/08/arabic-review-sr-rahat-basit/" | rg -o '<li>.*?href="(.*?)".*?</a> (.*?)<\/li>' -r '{"url": "$1", "title": "$2"}' | jq -s '.' | jq -r '.[] | "unescape_html \(.title | @sh)"' | bash
is giving:
– Flexibility– Pronouns
So, it is applying the bash function. Now I need to format is so that the urls are come with the title in json format.