Handling HTML entities in files like those in the FRELI project. (Or in the Webster's 1913 Unabridged)

$ grep "\(.\{6\}\).*\1" $w
débouché
higgledy-piggledy
résumé
twenty-twenty

Finding and replacing HTML entities used in resource #3 WebsUni (Webster's Words*)

Command: paste <(grep -o "&[^&;]*;" WebsWords|sort|uniq -c) <(for i in `grep -o "&[^&;]*;" WebsWords|sort|uniq`; do echo $i `echo $i|./UniTrans`; done)|awk '{print $1, $2, $4}'

Output:

7 &aacute; á

6 &acirc; â

528 &aelig; æ

5 &agrave; à

1 &atilde; ã

8 &auml; ä

14 &ccedil; ç

232 &eacute; é

18 &ecirc; ê

41 &egrave; è

174 &euml; ë

1 &icirc; î

14 &iuml; ï

15 &ntilde; ñ

1 &oacute; ó

7 &ocirc; ô

179 &oelig; œ

152 &ouml; ö

3 &ucirc; û

1 &ugrave; ù

12 &uuml; ü

 *[ddailey@daileyproject-srunet-sruad-edu word]$ ls -l WebsWords
lrwxrwxrwx. 1 ddailey ddailey 54 Jan  5  2017 WebsWords -> ../public_html/data/wordstudy/webster1913/WEBSTERwords

The program UniTrans is merely a chain of sed substitution commands:

$ cat /home/ddailey/word/UniTrans
sed '
s/&aacute;/á/g;
s/&acirc;/â/g;
s/&aelig;/æ/g;
s/&AElig;/Æ/g;
s/&agrave;/à/g;
s/&atilde;/ã/g;
s/&auml;/ä/g;
s/&ccedil;/ç/g;
s/&Ccedil;/Ç/g;
s/&eacute;/é/g;
s/&Eacute;/É/g;
s/&ecirc;/ê/g;
s/&egrave;/è/g;
s/&euml;/ë/g;
s/&icirc;/î/g;
s/&iuml;/ï/g;
s/&ntilde;/ñ/g;
s/&oacute;/ó/g;
s/&ocirc;/ô/g;
s/&oelig;/œ/g;
s/&OElig;/Œ/g;
s/&ouml;/ö/g;
s/&ucirc;/û/g;
s/&ugrave;/ù/g;
s/&uuml;/ü/g;'