quinta-feira, 28 de janeiro de 2016

Bash script to convert from HTML entities to characters - Stack Overflow

Bash script to convert from HTML entities to characters - Stack Overflow



I'm looking for a way to turn this:
hello < world
to this:
hello < world
I could use sed with a bunch of substitutions, but isn't there a tool that will do that for me in one go?
shareimprove this question
Try recode:
$ echo '&lt;' |recode html..ascii
<
shareimprove this answer
1 
link seems dead now – uglycoyote Apr 8 '15 at 16:44
1 
@uglycoyote Unfortunately. The Debian package might be a good alternative source:packages.debian.org/en/sid/recode. There is also a copy at Github: github.com/pinard/Recode – ceving Apr 13 '15 at 9:25 
With perl:
cat foo.html | perl -MHTML::Entities -e 'while(<>) {print decode_entities($_);}'
With php from the command line:
cat foo.html | php -r 'while(($line=fgets(STDIN)) !== FALSE) echo html_entity_decode($line, ENT_QUOTES|ENT_HTML401);'
shareimprove this answer
1 
The PHP one is not working for certain characters such as &nbsp; – Romain Paulus Dec 20 '13 at 5:13
3 
Shorter Perl version: perl -MHTML::Entities -pe 'decode_entities($_);' – RobEarl Aug 7 '14 at 8:48
1 
I'll give you an upvote if you remove the useless use of cat (en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat) :-) – 0x89 Aug 19 '14 at 9:10
   
Use perl -C -MHTML::Entities -pe 'decode_entities($_);' < foo.html to output UTF-8 (see this question) – tricasse Oct 2 '15 at 9:15
An alternative is to pipe through a web browser -- such as:
echo '&#33;' | w3m -dump -T text/html
This worked great for me in cygwin, where downloading and installing distributions are difficult.
This answer was found here
shareimprove this answer
Using xmlstarlet:
echo 'hello &lt; world' | xmlstarlet unesc
shareimprove this answer
2 
Note that this does not work for hexa entities like &#x3a;. – v6ak Aug 13 '13 at 21:00

Nenhum comentário: