xpdf是一个开源的pdf阅读软件, 它包含文本提取,pdf2ps,pdf2png,pdf2html等工具,文件小但高效。可以查看类似的Mupdf

poppler基于xpdf-3.0实现的渲染库, 只尝试了Glib版本和C++版本,其中Glib渲染 cairo surface,C++渲染图像。

本文介绍渲染 cairo surface 用于输出 svg 图像。

安装 cairo

sudo apt-get install libcairo2-dev

源码安装poppler

1) install poppler from source

  ls poppler-0.35.0.tar.xz
  tar xf poppler-0.35.0.tar.xz
  cd poppler-0.35.0/

  # 安装依赖库
  sudo apt-get install build-essential
  sudo apt-get install make
  sudo apt-get install pkg-config
  sudo apt-get install libfreetype6-dev
  sudo apt-get install libfontconfig1-dev
  sudo apt-get install libopenjpeg-dev
  sudo apt-get install libpng12-dev
  sudo apt-get install libtiff5-dev
  sudo apt-get install zlib1g-dev

  # 配置依赖库
  ./configure -h

  两个选项需要注意:
  * --enable-xpdf-header 用于指定是否保留xpdf的头文件
  * --enable-zlie 如果不指定即使已经安装zlib,在configure report中zlib也是no.

  # update ldconfig
  sudo vi /etc/ld.so.conf
  sudo ldconfig

代码编写

主要参考pdf2svg.

原始代码:

    // Poppler stuff
    PopplerDocument *pdffile;
    PopplerPage *page;
    pdffile = poppler_document_new_from_file(filename_uri, NULL, NULL);
    page = poppler_document_get_page(pdffile, 0);

    // Poppler stuff
    double width, height;

    // Cairo stuff
    cairo_surface_t *surface;
    cairo_t *drawcontext;

    poppler_page_get_size (page, &width, &height);

    // Open the SVG file
    surface = cairo_svg_surface_create(svgFilename, width, height);
    drawcontext = cairo_create(surface);

    // Render the PDF file into the SVG file
    poppler_page_render_for_printing(page, drawcontext);
    cairo_show_page(drawcontext);

基于流的实现:

    // Open the PDF file
    pdffile = poppler_document_new_from_data(pdfData, len, NULL, NULL);

    static cairo_status_t cairowrite(void *svg, unsigned char const *data, unsigned int length)
    {
      unsigned char *pos = (unsigned char *)(svg + strlen(svg));
      memcpy(pos, data, length);
      return CAIRO_STATUS_SUCCESS;
    }

    surface = cairo_svg_surface_create_for_stream((cairo_write_func_t)cairowrite, (void *)svgData, width, height);

到此实现了PDF到SVG的转换,但是生成的SVG只有PATH矢量信息,没有字体信息,文件体积比较大。