Currently, I’m working on my Master’s thesis on Hidden Markov Models. Matt Might wrote an article on three shell scripts to improve your writing, which I found interesting. The scripts help to detect the use of passive voice, weasel words (such as “surprisingly low”) and duplicate words (which are difficult to detect when a line break separates them).
One of the remarks I repeatedly received was that my paragraphs were much too short. A couple of paragraphs were just one or two sentences, which I could usually just throw together.
In the light of the Matt’s scripts, I wrote my own version, in which I detect paragraphs with only two or three sentences and spanning only a few lines. It isn’t perfect and not all small paragraphs need to be long, but it might warrant a closer inspection of your text.
#!/usr/bin/env python
import sys
import re
SINGLE_COMMAND_RE = re.compile(r'^\\\w+\{[^}]+\}$')
def process(file):
"""Ignores lines containing only a single command at the beginning of a
paragraph (piece of text surrounded by blank lines)."""
paragraph = [0, 0, 0] # [start, sentence count, linecount]
prev_line = None
for linenum, line in enumerate(file):
line = line.strip()
if SINGLE_COMMAND_RE.match(line) and not prev_line:
continue
if not line and paragraph[1]:
report_short_paragraph(filename, paragraph)
paragraph = [0, 0, 0]
else:
if paragraph[0] == 0:
paragraph = [linenum, 0, 0]
paragraph[1] += line.count('.')
paragraph[1] -= 2 * line.count('...')
paragraph[2] = linenum - paragraph[0] + 1
prev_line = line
def report_short_paragraph(filename, paragraph):
if paragraph[1] <= 2 and paragraph[2] < 4:
print '%s:%d paragraph of %d sentence(s) / %d lines' % (filename,
paragraph[0] + 1, paragraph[1], paragraph[2])
if __name__ == "__main__":
for filename in sys.argv[1:]:
file = open(filename)
process(file)