Tuesday, February 19, 2013

How to copy from a PDF to Word without losing your mind.

How to fix tangled line breaks!
For one reason or another we occasionally need to copy text from a PDF document so that we can use it in a Word document.  I won't go into why you might need to copy text from a PDF and paste it into Microsoft Word, your reasons are your own, but I find I most often have to use the technique below because I need words from an old document myself or a colleague has written to PDF but has lost the original word version.

When you copy text from a PDF and paste it into a Word document the formatting (especially line breaks) get totally trashed, an unless you know the trick below you can loose your mind trying to fix the formatting.

Copying from a PDF and pasting into Microsoft Word.

  1. On your PDF reader select the text you need and copy it (by right clicking and choosing 'copy' or by hitting Ctrl+C)
  2. Paste the copied text into notepad, this removes or 'cleans' any formatting that might confuse MS Word.  The process of copying and pasting 'through' notepad is a good trick to learn for general web editing and blogging (find out more HERE).
  3. Select all the text (Ctrl+S) in Notepad and copy it (Ctrl+C), then paste it (Ctrl+P) into an empty word document
  4. Select all the text and go to 'find and replace' (Ctrl+F)
  5. Click the 'more' box that appears at the bottom of the find and replace dialog box, then press the 'special' button and select 'Paragraph mark', this will put a ^p in the 'find what' box
  6. In the 'Replace with' box put two spaces, literally just two spaces.  Put the cursor in the box and hit the space bar twice
  7. Click 'Replace all' and then say yes, fine, whatever to the questions that pop up on the screen
  8. Unfortunately you will have lost the line breaks between paragraphs, but these are easy to get back, I tend to find that almost without fail a capital letter at the start of a line denotes the start of a new paragraph, so you can click your cursor there and hammer the return key once to restore your paragraph break.  This method is still a LOT faster than having to fix carriage returns at the end of every single line.  If you are copying out into another word processor or Google Drive / Docs then you will need to copy and paste back through notepad first
  9. Copy the reformatted text and paste it straight into the word document that requires the text, no need to rinse the text through notepad this time.

Getting images from PDFs
If you need the images from a PDF then open said PDF in Adobe Photoshop, when opening a PDF in PhotoShop you are given the choice of opening the document with each page as an image, or opening each image in the PDF as a separate image.