Read TOC (not Bookmarks) from a PDF using iText or PDFBox -
i need extract table of contents (toc) input pdf file. code have seen far refer bookmark. toc , bookmarks not same thing. there way extract toc pdf using itext or pdfbox. open using other tool available.
thanks
the table of contents of pdf refer to, nothing more ordinary text on page. hence option extract text on pages contain toc, instance using code shown in extractpagecontentsorted2 example:
public void parsepdf(string pdf, string txt) throws ioexception { pdfreader reader = new pdfreader(pdf); printwriter out = new printwriter(new fileoutputstream(txt)); (int = 1; <= reader.getnumberofpages(); i++) { out.println(pdftextextractor.gettextfrompage(reader, i)); } out.flush(); out.close(); reader.close(); } this example extracts text of all pages in pdf , writes file path txt. if want code extract pages table of contents, going have change page numbers in for loop, instance:
for (int = starttoc; <= endtoc; i++) where starttoc page number toc starts , endtoc page number toc ends. you need provide these numbers, because pdf document such isn't aware of fact content on pages are, in fact, table of contents. pdf knows pages contain text , rendered paths , maybe images. inherent pdf.
Comments
Post a Comment