Click here to return to the 'Word Counter - Count words and characters in text' hint |
textutil
CLI program along with wc
to get an accurate wordcount. Here's the command: textutil -stdout -convert txt foobar.doc | wc -w
Windows 10 app microphone access.textutil
to send its output to standard output instead of a file (-stdout
) and convert to plain text (-convert txt
) a Word formatted file called foobar.doc
. The output of that is sent/piped to wc
where we ask for a simple count of just the words (-w
). textutil -stdout -cat txt *.doc | wc -w
-cat
tells it to concatenate all of the .doc files in the working directory. When I ran wc -wc *.doc on the whole book, it told me that there were 4.3million characters in the files. Word Counter returns a much more accurate figure of just under 1.0million. So explain to me what I'm doing wrong?Probably nothing. Word generates a heap of text in the files that aren't displayed. Do you use the versioning facility, or track changes? They tend to make the filesizes huge. Also, if you open the docs up in a text editor, you'll probably find lots of extra stuff (like your address etc. all stored in there too!)