Read TOC (not Bookmarks) from a PDF using iText or PDFBox -

June 15, 2014

i need extract table of contents (toc) input pdf file. code have seen far refer bookmark. toc , bookmarks not same thing. there way extract toc pdf using itext or pdfbox. open using other tool available.

thanks

the table of contents of pdf refer to, nothing more ordinary text on page. hence option extract text on pages contain toc, instance using code shown in extractpagecontentsorted2 example:

public void parsepdf(string pdf, string txt) throws ioexception {     pdfreader reader = new pdfreader(pdf);     printwriter out = new printwriter(new fileoutputstream(txt));     (int = 1; <= reader.getnumberofpages(); i++) {         out.println(pdftextextractor.gettextfrompage(reader, i));     }     out.flush();     out.close();     reader.close(); }

this example extracts text of all pages in pdf , writes file path txt. if want code extract pages table of contents, going have change page numbers in for loop, instance:

for (int = starttoc; <= endtoc; i++)

where starttoc page number toc starts , endtoc page number toc ends. you need provide these numbers, because pdf document such isn't aware of fact content on pages are, in fact, table of contents. pdf knows pages contain text , rendered paths , maybe images. inherent pdf.

Search This Blog

UV code

Read TOC (not Bookmarks) from a PDF using iText or PDFBox -

Comments

Post a Comment

Popular posts from this blog

shopping cart - Page redirect not working PHP -

php - How to modify a menu to show sub-menus -

python - Installing PyDev in eclipse is failed -