May 24th, 2011

sem pdf, seo pdfIf you think that Google cannot index PDF files, you would be wrong — it is yet another avenue that webmasters and marketers can use to help the search-engine giant to index and rank a website properly and relevantly.

Most SEO and SEM experts know that the type of content can matter just as much as the content itself. Or, in other words: the medium, when it comes to Google, can be just as important as the message. For example, Flash web-development software is rightly known as the bane of online marketing’s existence. And for good reason.

Search engines scan and index the HTML (or PHP or other) code of a website — or, in other words, they can read only text. One reason SEO professions recommend that the alt-text of images be optimized for targeted keywords is that the field is what Google can see — Google cannot see the image itself. In a nutshell, Google sees a page like this:

The keyword in the middle is the alt-text of the image that was placed between the two blocks of text. The image is invisible; Google reads it as text just like the rest of the words and code on the page. The same is true for Flash — any part of a website created with the software is invisible to search engines — and it is unable to be optimized like images themselves. While Flash may impress visitors with, well, flashy designs — the graphics impede efforts to rank highly for one’s targeted, relevant keywords. It is better to use text-based links for items like section headings and tables of contents. (It is also one of the reasons to hire an SEO before you develop your website.)

However, the rule does not apply to PDF files because the documents — when created correctly — are not image files. If you want proof, here is top Google search-result for “SEO PDF”:

Directly below the meta title (page title) and in the URL as well, you will see that the link goes directly to the PDF. Moreover, the “Quick View” link below the link opens an HTML version of the file. The obvious conclusion: Google could not generate an HTML version of the PDF if the search engine were not able to index the PDF in the first place.

Advice on SEM-PDF and SEO Optimization

So, it is important to optimize PDF files for SEO just as one would for page headings, meta descriptions, image alt-tags, and a host of other items. But how exactly does one do that? Matt McGee offers a few good tips. In part:

2. PDF optimization is similar to optimization for a regular content page. Try this: good use of keywords/phrases, appropriate headlines and sub-headlines, solid content that reads well to a human eye, etc. If the PDF will include images, a caption underneath each image would be a good idea, especially if the caption includes a targeted keyword/phrase. (Of course, don’t overdo it. Remember my mom’s advice about SEO.)

Proof: Using the search above, we find this PDF ranked prominently in all three engines. On page 9 of this PDF, there’s a bold content heading (the equivalent of an H2): Awareness and Usage of the XML Button. Let’s not use the exact text, but something close: Here are the SERPs for [xml button awareness]: Yahoo, Google, and MSN. In each case, you find the PDF ranked highly in the SERPs and that exact bold content heading showing prominently in the snippet.

3. The most important thing where PDFs and SEO is concerned is how the PDF is created. Don’t use Photoshop to make your PDF, because when you do that, you’re actually making a big image file, not a true PDF — and the spiders cannot crawl or “read” the text from that image file. The PDF should be created with a text-based program, like MS Word or Adobe Pagemaker, so that the final product is text-based and can be crawled.

Kevin Harris offers more thoughts as well on the idea of optimizing PDFs for more-general keywords (or long-tail keywords) — but do not risk making the documents seem like a content farm. The key to a good SEO title is not to use PDF files as a form of article-marketing automation — it is, as I mentioned in “SEM-Video Resources,” to use them to transform your business or yourself as a thought-leader in your chosen industry.

The issue: Anyone can put text on a website. Heck, I am doing it right now. But even though anyone can also create a PDF in a matter of seconds, the format still carries more cache from a branding standpoint. Here in Israel, I once worked as the online-marketing manager for a start-up involved in speech-to-text transcription, and my goal — if the venture-capital funding had come through before we were laid off — was to post extensive, academic white-papers on phonetics, linguistics, and related topics to communicate the vast amount of knowledge held by the company. And, of course, they would have been published on the website as PDFs rather than generic text.

Apply the same example to your specific industry or sector — but just make sure that you optimize the PDFs for Google.

