People have been reporting great performance improvements with a patch that improves Lucene's memory usage. Now another patch has been committed that speeds up the way documents are tokenized. I made a quick test to see if the performance improvements of both these patches can not only be realized with smaller documents but also with larger ones. Here are the results:

Indexing time with Lucene 2.2: 58 sec
Indexing time with Lucene trunk (2007-08-09): 17 sec

Thus indexing is 4 times faster now in my test case!

Details about the small test collection:
Document format: plain text
Total size of documents: 28 MB
Number of documents: 71 (i.e. about 400 KB average document size)
Index size: 11 MB
Heap memory for JVM: 10 MB (i.e. -Xmx10M)

Notable API calls I used: indexwriter.setRAMBufferSizeMB(5) (to make it work with just 10 MB of heap memory) and indexwriter.setMaxFieldLength(Integer.MAX_VALUE) (to make sure nothing gets cut off)