Proofreading Course Blog

4/1/2020

How to proofread a website

Read Now

How to proofread a website

As The No-Nonsense Proofreading Course makes clear, not all proofreading takes place in publishing houses. It’s not all literature and manuscripts. Anywhere words are being produced for human consumption, there will be a need for proofreaders.

And nowhere are there more words being produced for human consumption than on the internet. If you were to print out all of the internet’s web pages as if they were the pages of a book, it would be (according to a 2015 report) 300 billion pages long. That’s 305,500,000,000 pages, the equivalent of 212 million copies of ‘War and Peace’.

Obviously, most of those pages are about cats, Harry Potter and naked people (not at the same time), but a large proportion are business websites. In many cases, this will be the only means a company has to communicate with its audience of potential customers. Not all businesses can afford to advertise in magazines or send out glossy brochures. Online real estate offers great value for money.

But where there are words, there are errors. And for any business an error can range from an embarrassing faux pas to a financially damaging catastrophe.

Which is where you step in, to shield businesses from embarrassing faux pas and financially damaging catastrophes. Like Captain America, but with a dictionary instead of a shield.

So, someone asks you to proofread their website. What do you do?

The first thing you do is say, “Sorry, no, I don’t proofread websites.”

Okay, don’t actually say that, but let your client know (as politely possible) that you don’t proofread ‘websites’, you proofread ‘website content’.

Proofreading a website is a potential nightmare. It isn’t like proofreading a document or a manuscript or a piece of print-ready artwork. Why? Because it isn’t linear. You don’t start at the front cover, tackle the front matter, then work your way through in page order. A website is more like… well, a web. Not that any self-respecting spider would appreciate the comparison. Spiderwebs are neat, geometrical things. Websites are, frankly, a mess.

In theory, you could start on the home page and work your way through the main navigation. But not all pages are accessible through the main navigation. What about all those links at the bottom of many websites? Terms and Conditions, Refund Policy, those things? What about all the pages that can only be found through links within the content of other web pages? How do you begin to identify and read those? Well, there is a way, but we’ll deal with that later. For now, we’re going down the ‘website content’ route.

You need to ask your client to provide you with the content that they want you to read. This can be in the form of Word documents or PDFs. If they’re in the process of creating a website, this is easy. They’ll likely produce the content in that form anyway prior to uploading it.

Once you’re in possession of these documents, print them out. As The No-Nonsense Proofreading Course insists time and again: ALWAYS READ FROM A HARD COPY. NEVER FROM THE SCREEN.

Once you’ve sent your corrections to your client, they can correct them and upload them or (if the content is already live) make the necessary amendments themselves or have their web developer do it.

But what if my client isn’t very tech-savvy and wants me to “just proofread the website”?

It’s not ideal, but it’s also not the end of the world. Avoid this situation if you can, but if you need the work and this is the only way the client wants to play it, follow the process outlined below.

Firstly, you’re going to need to identify all the web pages that need to be checked.

There are probably a number of ways of doing this, but I’m going to cover two. These are the two I use. Yes, even I get pain-in-the-backside customers who say, “Can’t you just check it on your laptop?”

Get hold of the website’s sitemap. If it has one. Many sites do. Not all. Not even most. But plenty. To find a website’s sitemap, go to the site’s home page then, in the address bar, at the end of the site’s URL add “/sitemap” or “/sitemap.xml”. This should bring up a page showing every page that exists for that site.

Here’s an example:

You’ve been asked to proofread the website for a guy who’s famous for eating crockery. He’s a big deal on YouTube. His website address (or URL) is www.ieatcrockery.com.

So, go to his website and add the suffices I mentioned earlier:
www.ieatcrockery.com/sitemap or www.ieatcrockery.com/sitemap.xml

Hopefully one of these will present you with a comprehensive list of every web page on the site.

But what to do about websites that don’t have a site map?

Well, you could ask your client to generate one. There are websites and applications that can do this. If your client is using the WordPress content management system, then they can install, activate and use a simple plugin. Alternatively, you can use something like XML-Sitemaps.com to generate sitemap.

Another method is to use Google Analytics (or to ask your client to do so). They can go to the ‘Behaviour’ section, then ‘Site Content’ and finally ‘All Pages’. If they set the date parameters for the last 12 months, this will give you a list of all the active pages on the site. The data can be downloaded as a CSV document which can be opened in Excel or most other spreadsheet applications, or it can be downloaded as a PDF. Either way, you are presented with a comprehensive list of web pages.

So, now you know what pages the site is comprised of, you can begin to harvest the content.

The easiest way to do this is to ‘print’ the web page.

This can just be a simple case of hitting ‘Control P’, selecting ‘Destination’ as ‘Save as PDF’ and job done. Depending on the size of the website, within a few minutes (hours, days, weeks…) you’ll have a folder full of web pages ready to proofread.

But not all websites will allow you to print their content and some web pages are formatted in such a way that the content won’t fit onto a single page and remain at a size that is legible.

What if a website's content is protected?

It’s time to start swiping.

You need to go into each page and swipe the content, then paste it into a Word document or similar. Now, this can get messy. Swiping and copying text on a website is not like swiping and copying text in a word document. It’s probably closer to swiping and copying text from a PDF. You can find yourself picking up text from an adjacent paragraph or from a caption. When you’re selecting your text, you might (actually you almost certainly will) click on a link and find yourself on an entirely different web page.

It’s frustrating. It’s long-winded. But it’s necessary.

It’s vital that you shepherd your content from the virtual world of the internet to a format that can be printed out as a hard copy. Because – sorry, I know I keep going on and on about this – ALWAYS PROOFREAD FROM A HARD COPY. NEVER FROM THE SCREEN.

Now, some websites can’t be copied and pasted; there are content-protection measures that prevent it. What then? Well, if your client (or potential client) can’t give you the content in another form, I’d strongly recommend a hard pass. The only exception I’d make is if the website is very light on content with a simple layout.

Proofreading a website: summary

Get the content in document form if you can
Get a site map if you can. If not, use Google Analytics to list all pages.
Print off the content if you can. If not, cut and paste.
Proofread from printed documents.
Only proofread from the website itself if it has little content and a simple layout.

If you have any questions, please feel free to send them via our contact form.

0 Comments

How to proofread a website

How to proofread a website

Blog Author

The No-Nonsense Proofreading Course

Proofreading Categories

Proofreading Archives