sexta-feira, 29 de janeiro de 2016

https://www.w3.org/Tools/HTML-XML-utils/README

https://www.w3.org/Tools/HTML-XML-utils/READMEhtml-xml-utils-*.tar.gz

A number of simple utilities for manipulating HTML and XML files.
    See INSTALL for generic installation instructions.
    Get the source at: http://www.w3.org/Tools/HTML-XML-utils/

htmlutils-*.tar.gz
    Old versions (before version 0.1)

Note: the names changed in version 5.0: most programs got an "hx"
prefix. Please, uninstall any version < 5.0 before installing a
version >= 5.0


cexport (1)          - create headerfile of exported declarations from a C file
hxaddid (1)          - add ID's to selected elements
hxcite (1)           - replace bibliographic references by hyperlinks
hxcite-mkbib (1)     - expand references and create bibliography
hxcopy (1)           - copy an HTML file while preserving relative links
hxcount (1)          - count elements and attributes in HTML or XML files
hxextract (1)        - extract selected elements
hxclean (1)          - apply heuristics to correct an HTML file
hxprune (1)          - remove marked elements from an HTML file
hxincl (1)           - expand included HTML or XML files
hxindex (1)          - create an alphabetically sorted index
hxmkbib (1)          - create bibliography from a template
hxmultitoc (1)       - create a table of contents for a set of HTML files
hxname2id            - move some ID= or NAME= from A elements to their parents
hxnormalize (1)      - pretty-print an HTML file
hxnum (1)            - number section headings in an HTML file
hxpipe (1)           - convert XML to a format easier to parse with Perl or AWK
hxprintlinks (1)     - number links & add table of URLs at end of an HTML file
hxremove (1)         - remove selected elements from an XML file
hxtabletrans (1)     - transpose an HTML or XHTML table
hxtoc (1)            - insert a table of contents in an HTML file
hxuncdata (1)        - replace CDATA sections by character entities
hxunent (1)          - replace HTML predefined character entities to UTF-8
hxunpipe (1)         - convert output of pipe back to XML format
hxunxmlns (1)        - replace "global names" by XML Namespace prefixes
hxwls (1)            - list links in an HTML file
hxxmlns (1)          - replace XML Namespace prefixes by "global names"
asc2xml, xml2asc (1) - convert between UTF8 and &#nnn; entities
hxref (1)            - generate cross-references
hxselect (1)         - extract elements that match a (CSS) selector



This package is configured with automake/autoconf. Generic
instructions are in the file INSTALL. Here are some specific problems
that may arise:

1) Error when running lex:

      lex   scan.l && mv lex.yy.c scan.c
      "scan.l":line 2: Error: missing translation value

   The scan.l file uses features of flex that do not exist in lex.
   However, it is not necessary to run lex, since the file scan.c is
   provided in the package. Just do a "touch scan.c" to make sure
   "make" will not try to generate it anew.


$Date: 2014-10-21 17:11:29 $

Text To Speech and Translation Application For Ubuntu Linux - Gespeaker | Linux Blog

Text To Speech and Translation Application For Ubuntu Linux - Gespeaker | Linux Blog

Text To Speech and Translation Application For Ubuntu Linux - Gespeaker

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak is available as:
 * A command line program (Linux and Windows) to speak text from a file or from stdin.
 * A shared library version for use by other programs.

Gespeaker is a GTK+ front-end for espeak. It allows to play a text in many languages with settings for voice, pitch, volume, speed and word gap. The text played can also be recorded to WAV file.

Gespeaker supports multiple languages, currently English, Italian, French and Spanish. It works well with both Gnome, XFCE, LXDE environments. 

Gespeaker Installation:
Open the terminal and type following command to install Gespeaker:
sudo apt-get install gespeaker
Currently Ubuntu packagers does not include mbrola in the official repositories, Ubuntu users will need to install mbrola and the voices from the Ubuntu Trucchi repository in this way from the terminal type following command:
sudo wget -O /etc/apt/sources.list.d/ubuntutrucchi.list http://www.ubuntutrucchi.it/repository/ubuntutrucchi.list
wget -O - http://www.ubuntutrucchi.it/repository/ubuntutrucchi.asc | sudo apt-key add -
sudo apt-get update 
After successful installation you can open the Gespeaker from the Unity 'Dash'


Using Gespeaker is easy, just enter text in the available text box, select a voice type (male or female), and a language from the drop down list. Click on Play button to hear the playback of the entered text in a selected language. You can also record the sound using the Record option.

Gespeaker also allows to play a text in many languages with settings  for voice, pitch, volume, speed and word gap.


Read more: http://linuxpoison.blogspot.com.br/2012/02/text-to-speech-and-translation.html#ixzz3yZZlDffN

quinta-feira, 28 de janeiro de 2016

Text To Speech and Translation Application For Ubuntu Linux - Gespeaker | Linux Blog

Text To Speech and Translation Application For Ubuntu Linux - Gespeaker | Linux Blog

Text To Speech and Translation Application For Ubuntu Linux - Gespeaker

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak is available as:
 * A command line program (Linux and Windows) to speak text from a file or from stdin.
 * A shared library version for use by other programs.

Gespeaker is a GTK+ front-end for espeak. It allows to play a text in many languages with settings for voice, pitch, volume, speed and word gap. The text played can also be recorded to WAV file.

Gespeaker supports multiple languages, currently English, Italian, French and Spanish. It works well with both Gnome, XFCE, LXDE environments. 

Gespeaker Installation:
Open the terminal and type following command to install Gespeaker:
sudo apt-get install gespeaker
Currently Ubuntu packagers does not include mbrola in the official repositories, Ubuntu users will need to install mbrola and the voices from the Ubuntu Trucchi repository in this way from the terminal type following command:
sudo wget -O /etc/apt/sources.list.d/ubuntutrucchi.list http://www.ubuntutrucchi.it/repository/ubuntutrucchi.list
wget -O - http://www.ubuntutrucchi.it/repository/ubuntutrucchi.asc | sudo apt-key add -
sudo apt-get update 
After successful installation you can open the Gespeaker from the Unity 'Dash'


Using Gespeaker is easy, just enter text in the available text box, select a voice type (male or female), and a language from the drop down list. Click on Play button to hear the playback of the entered text in a selected language. You can also record the sound using the Record option.

Gespeaker also allows to play a text in many languages with settings  for voice, pitch, volume, speed and word gap.


Read more: http://linuxpoison.blogspot.com.br/2012/02/text-to-speech-and-translation.html#ixzz3yZZlDffN

Bash script to convert from HTML entities to characters - Stack Overflow

Bash script to convert from HTML entities to characters - Stack Overflow



I'm looking for a way to turn this:
hello &lt; world
to this:
hello < world
I could use sed with a bunch of substitutions, but isn't there a tool that will do that for me in one go?
shareimprove this question
Try recode:
$ echo '&lt;' |recode html..ascii
<
shareimprove this answer
1 
link seems dead now – uglycoyote Apr 8 '15 at 16:44
1 
@uglycoyote Unfortunately. The Debian package might be a good alternative source:packages.debian.org/en/sid/recode. There is also a copy at Github: github.com/pinard/Recode – ceving Apr 13 '15 at 9:25 
With perl:
cat foo.html | perl -MHTML::Entities -e 'while(<>) {print decode_entities($_);}'
With php from the command line:
cat foo.html | php -r 'while(($line=fgets(STDIN)) !== FALSE) echo html_entity_decode($line, ENT_QUOTES|ENT_HTML401);'
shareimprove this answer
1 
The PHP one is not working for certain characters such as &nbsp; – Romain Paulus Dec 20 '13 at 5:13
3 
Shorter Perl version: perl -MHTML::Entities -pe 'decode_entities($_);' – RobEarl Aug 7 '14 at 8:48
1 
I'll give you an upvote if you remove the useless use of cat (en.wikipedia.org/wiki/Cat_(Unix)#Useless_use_of_cat) :-) – 0x89 Aug 19 '14 at 9:10
   
Use perl -C -MHTML::Entities -pe 'decode_entities($_);' < foo.html to output UTF-8 (see this question) – tricasse Oct 2 '15 at 9:15
An alternative is to pipe through a web browser -- such as:
echo '&#33;' | w3m -dump -T text/html
This worked great for me in cygwin, where downloading and installing distributions are difficult.
This answer was found here
shareimprove this answer
Using xmlstarlet:
echo 'hello &lt; world' | xmlstarlet unesc
shareimprove this answer
2 
Note that this does not work for hexa entities like &#x3a;. – v6ak Aug 13 '13 at 21:00