Link
Skip to main content

How to query and extract PDF metadata and metrics in Java

The JPedal library can be used to query and extract metadata from PDF files. There are several methods in the PdfUtilities class. View the PdfUtilities Javadoc here.

To get started, create an instance of the PdfUtilities class from either a file or a byte array.

// Load from file
final PdfUtilities utilities = new PdfUtilities("inputFile.pdf");

// Load from byte array
final byte[] pdfBytes;
final PdfUtilities utilities = new PdfUtilities(pdfBytes);

If the file is encrypted, you must supply the password.

utilities.setPassword("password");

Next you will need to decode the file so you can access its metadata. You should also close the file after you have finished reading it.

if (utilities.openPDFFile()) {
    // Add metadata query methods here
}

utilities.closePDFfile();

Now that the file is decoded, call any of the below methods.

Get the page count

final int numPages = utilities.getPageCount();

Page numbers in PDF files start from 1, so the result of getPageCount() is also the page number of the last page in the document.

Get the page dimensions

final int page = 1;
final PdfUtilities.PageUnits units = PdfUtilities.PageUnits.Pixels;
final PdfUtilities.PageSizeType box = PdfUtilities.PageSizeType.CropBox;
final float[] pageDimensions = utilities.getPageDimensions(page, units, box);

Page dimensions can be returned as centimeters, inches, or pixels.

You can either query the page’s crop box or media box. What’s the difference?

Get the PDF version

final String version = utilities.getPDFVersion();

Get all the document properties

final Map<String, String> documentProperties = utilities.getDocumentPropertyStringValuesAsMap();

The returned map uses the property name as the key and the entry value contains the properties value.

Document properties are deprecated in PDF-2.0 in favour of metadata fields.

Get all the metadata fields

final String documentMetadata = utilities.getDocumentPropertyFieldsInXML();

Check if the file contains any embedded fonts

final boolean hasEmbeddedFonts = utilities.hasEmbeddedFonts();

Get all the font data

final Map<Integer, String> documentFontData = utilities.getAllFontDataForDocument();

The returned map uses the page number as the key and the entry value contains details about the fonts for that page.

Get the font data for a page

final int page = 1;
final String fontDataForPage = utilities.getFontDataForPage(page);

Determine if the file contains marked content

final boolean containsMarkedContent = utilities.isMarkedContent();

What is marked content?

Get the PDF permissions

final int permissions = utilities.getPdfFilePermissions();
PdfUtilities.showPermissionsAsString(permissions);

Get the image data for a page

final int page = 1;
final String imageDataForPage = utilities.getXImageDataForPage(page);

Get the number of commands for a page

final int page = 1;
final int commandsForPage = utilities.getCommandCountForPageStream(page);

Learn more

Check out our GitHub profile for a full example project.


Why JPedal?

  • Actively developed commercial library with full support and no third party dependencies.
  • Process PDF files up to 3x faster than alternative Java PDF libraries.
  • Simple licensing options and source code access for OEM users.

Learn more about JPedal

Start Your Free Trial


Customer Downloads

Select Download