split a pdf into single pages

Six Stars

split a pdf into single pages

Hi,

 

I need a way to split PDFs into their single Pages within a Talend job to further process them.

 

Does anybody has a good solution for this?

 

Thanks


Accepted Solutions
Highlighted
Six Stars

Re: split a pdf into single pages

Meanwhile Ive found the solution, so i thought i post it here, if someone needs it.

Ive written a small routine:

 

package routines;
import java.io.File;
import java.io.IOException;
import java.util.List; 
import java.util.Iterator;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.multipdf.Splitter; 


public static void splitPdf(String arg, String directory) throws IOException
    {
    	PDDocument document = PDDocument.load(new File(arg));
    	Splitter splitter = new Splitter();
    	List<PDDocument> Pages = splitter.split(document);
    	Iterator<PDDocument> iterator = Pages.listIterator();
    		
    	int i = 1;
    	while (iterator.hasNext()) {
    		PDDocument pd = iterator.next();
    		pd.save(directory+ i + ".pdf");
    		i++;
    	}
    	document.close();
    }

It takes the PDF given and extracts every single page to a directory.


All Replies
Highlighted
Six Stars

Re: split a pdf into single pages

Meanwhile Ive found the solution, so i thought i post it here, if someone needs it.

Ive written a small routine:

 

package routines;
import java.io.File;
import java.io.IOException;
import java.util.List; 
import java.util.Iterator;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.multipdf.Splitter; 


public static void splitPdf(String arg, String directory) throws IOException
    {
    	PDDocument document = PDDocument.load(new File(arg));
    	Splitter splitter = new Splitter();
    	List<PDDocument> Pages = splitter.split(document);
    	Iterator<PDDocument> iterator = Pages.listIterator();
    		
    	int i = 1;
    	while (iterator.hasNext()) {
    		PDDocument pd = iterator.next();
    		pd.save(directory+ i + ".pdf");
    		i++;
    	}
    	document.close();
    }

It takes the PDF given and extracts every single page to a directory.

2019 GARNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

What’s New for Talend Summer ’19

Watch the recorded webinar!

Watch Now

Best Practices for Using Context Variables with Talend – Part 2

Part 2 of a series on Context Variables

Blog

Best Practices for Using Context Variables with Talend – Part 1

Learn how to do cool things with Context Variables

Blog

Migrate Data from one Database to another with one Job using the Dynamic Schema

Find out how to migrate from one database to another using the Dynamic schema

Blog