Currently, I’m working on my Master’s thesis on Hidden Markov Models. Matt Might wrote an article on three shell scripts to improve your writing, which I found interesting. The scripts help to detect the use of passive voice, weasel words (such as “surprisingly low”) and duplicate words (which are difficult to detect when a line break separates them).
One of the remarks I repeatedly received was that my paragraphs were much too short. A couple of paragraphs were just one or two sentences, which I could usually just throw together.
In the light of the Matt’s scripts, I wrote my own version, in which I detect paragraphs with only two or three sentences and spanning only a few lines. It isn’t perfect and not all small paragraphs need to be long, but it might warrant a closer inspection of your text.
#!/usr/bin/env python import sys import re SINGLE_COMMAND_RE = re.compile(r'^\\\w+\{[^}]+\}$') def process(file): """Ignores lines containing only a single command at the beginning of a paragraph (piece of text surrounded by blank lines).""" paragraph = [0, 0, 0] # [start, sentence count, linecount] prev_line = None for linenum, line in enumerate(file): line = line.strip() if SINGLE_COMMAND_RE.match(line) and not prev_line: continue if not line and paragraph[1]: report_short_paragraph(filename, paragraph) paragraph = [0, 0, 0] else: if paragraph[0] == 0: paragraph = [linenum, 0, 0] paragraph[1] += line.count('.') paragraph[1] -= 2 * line.count('...') paragraph[2] = linenum - paragraph[0] + 1 prev_line = line def report_short_paragraph(filename, paragraph): if paragraph[1] <= 2 and paragraph[2] < 4: print '%s:%d paragraph of %d sentence(s) / %d lines' % (filename, paragraph[0] + 1, paragraph[1], paragraph[2]) if __name__ == "__main__": for filename in sys.argv[1:]: file = open(filename) process(file)