Wednesday, June 11, 2014

Super Fast, Memory Optimized Excel Generation Using Apache POI


Problem Context 

Need to generate Huge Excels (Millions of Rows). When using a normal approach (in my previous example) when we tried to dump huge data sets into excels we faced many issues related to performance esp. OutOfMemory and Long time to generate excel (even after increased heap to 4GB).

Note that in previous mocker example I have used only 3 columns and Data size is relatively small than real world stage. So situation becomes more worse in real world than in mockers. And believe me no one understand Developer at that time.

Resolve the problem

Apache poi comes with memory friendly version of workbook class called org.apache.poi.xssf.streaming.SXSSFWorkbook. It has some mechanism called Row flushing. Meaning that flush rows from memory at defined WindowSize. And it has some cons. See more at.
http://poi.apache.org/apidocs/org/apache/poi/xssf/streaming/SXSSFWorkbook.html


So here it is...

(tweaked version from previous example but you can write your own separate component for it)

1. PersonDO - as it is.

2. Write to Excel


import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.ArrayList;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.xssf.streaming.SXSSFSheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;

import org.apache.poi.xssf.usermodel.XSSFWorkbook;

//import statements
public class ReadExcelDemo
{
public static void main( String[] args )
{
System.out.println( java.lang.Runtime.getRuntime().totalMemory() );
try
{
FileInputStream file = new FileInputStream( new File( "new_example2.xlsx" ) );

// Create Workbook instance holding reference to .xlsx file

XSSFWorkbook workbook = new XSSFWorkbook( file );

// TWEAKED HERE                                    
SXSSFWorkbook wb = new SXSSFWorkbook( workbook );  
SXSSFSheet sheet = (SXSSFSheet) wb.getSheetAt( 0 );   

ArrayList<PersonDO> persons = PersonDO.getPersons( 1000 );
int startIndex = 1;
System.out.println( "Populating data..." );
for ( PersonDO p : persons )
{
Row row = sheet.createRow( startIndex ); // getRow( startIndex );
row.createCell( 0 ).setCellValue( p.getName() );
row.createCell( 1 ).setCellValue( p.getPhone() );
row.createCell( 2 ).setCellValue( p.getCity() );
if ( startIndex % 500 == 0 )
{
System.out.println( "At " + startIndex );
System.out.println( java.lang.Runtime.getRuntime().freeMemory() + " bytes" );
}
startIndex++;
}
System.out.println( "Writing to excel..." );
FileOutputStream fileOut = new FileOutputStream( "workbook.xlsx" );
wb.write( fileOut );
System.out.println( "Completed..." );
fileOut.close();
file.close();
wb.dispose();
}
catch ( Exception e )
{
e.printStackTrace();
}
}
}

2 comments:

  1. Hi,
    I tried your "Tweaking" and that does not tweak enough! Sorry to say - it does NOT Work!!

    ReplyDelete
  2. It works and it's very fast. Only problem is when you write a row and try to write more data in that row later, it throws an exception.

    ReplyDelete