Charles Hart Enzer, M.D. wrote:
> 
> How to I unwrap paragraphs of importing text?
Here is a simple minded perl script that might be useful.  Not that there 
aren't already a million other things out there ...
   Jim
#!/usr/local/bin/perl
#depara.pl in out reads "in" and attempts to remove line terminators from the
#text so that it can be inserted into stuff like mozilla and will be
#reformatted accordingly.  The rules are pretty much the old ones - if the
#next line is nonexistent or blank or starts with a blank, we put an end of
#line.  Otherwise we join it with a blank, or two blanks if the current
#chomped line ends in any member of a punctuation list.  Blank lines are
#preserved.
open(IN,"<$ARGV[0]") or die "no such input file as $ARGV[0]";
open(OUT,">$ARGV[1]") or die "can't open <$ARGV[1]>";
$in = 0;  $out = 0;
$endsent = "[.;:?]\$";
$line = "";
$any = 0;
foreach (<IN>) {
  ++$in;
  chomp;
  if( ! $any ) {
    /^\s*$/ and  (print(OUT "\n"), ++$out, next);
    $line = $_;  ++$any;  next; }
  /^\s*$/ and  (print(OUT "$line\n"),
                $line = "", $any = 0, print(OUT "\n"), $out += 2, next);
  /^\s/   and  (print(OUT "$line\n"), ++$out,
                $line = $_,  $any = 1,  next);
  $line =~ /$endsent/o and ($line .= " ");
  $line .= " ";
  $line .= $_;
  ++$any;
 }
$any and (print(OUT "$line\n"), ++$out);
print "Read $in lines, wrote $out to $ARGV[1]\n";
  
-----------------------------------------------
To unsubscribe from this list, send a message to
abiword-user-request@abisource.com with the word
unsubscribe in the message body.
Received on Fri Aug  5 08:00:07 2005
This archive was generated by hypermail 2.1.8 : Fri Aug 05 2005 - 08:00:07 CEST