Intro to Search: Initial Considerations

This post is the first in a series about what we learned from developing search products for WordPress.com. In this post, I’ll give you a brief tour of some learnings from deploying search in the WordPress.com Reader. Improving this search tool to help our users find engaging articles they really like is an effort, and an ongoing learning experience.

The WordPress.com Reader is the place where our users can keep up with sites they like, whether they’re personal blogs, high profile sites on WordPress.com, or sites that connect to WordPress.com with Jetpack. In fact, users can add any RSS feed they like. The list of sites I follow includes Office Today, TED Ideas and 500px ISO as well as several data science blogs.

The Reader is also a great tool for discovering new content, and this is where the search functionality contributes a lot. When we analyzed where our users found new sites to follow, we saw that a quarter of all new site follows originate with the search tool.

Challenges

At WordPress.com, we have a very large body of documents. There are literally billions of posts on our platform, and it is rapidly growing every day. We find that Elasticsearch is a great tool to handle all these documents and to make them searchable.

The documents we deal with are also very heterogenous: they cover all kinds of topics. Some are very long, like the ones you’ll find highlighted on Longreads, and some are photo posts. Some are written by professionals, some by novice writers who are just starting to find their voice. Our authors live in all parts of the world, and they write in many different languages.

As on any publishing platform, it is only natural that authors try to get as much attention for their sites as possible. At WordPress.com, we offer many tools to promote posts. But when an author crosses the line from self-promotion to spam, we have to protect readers’ interests. We constantly work to balance authors’ interests in promotion with readers’ interest in easily finding the highest quality content.

In addition to mastering these challenges, we have to match our users’ intents and expectations.

Users’ expectations

Users approach search with different intents [1]. Some are looking for information (informational searches). Some want to navigate to a specific site (navigational searches), and some wish to perform a transaction, like booking a flight or changing a setting (transactional searches). Users expect a search engine to cater to every type of search. In the WordPress.com Reader, we see all three of these search types. However, most searches don’t fit neatly into any category, and are best described as “looking for inspiration” or “keeping up with a topic.”

Even though we might each approach a search box with different goals, there are general trends in what most of us hope to find in search results. Broadly summarizing the research of Barry & Schamber [2] and Crystal & Greenberg [3], all of the following are important:

Relevance: Most importantly, the results should be relevant to the keywords we entered, especially the first couple of results — scrolling is tiring, and we form opinions about the quality of the search algorithm itself by skimming through the initial results.

Trust: No one likes being taken to a sketchy site or spammy article.

Originality: We prefer original content on trustworthy sites, ideally written or endorsed by experts in the subject matter. More detailed information is usually preferred over shallow content.

Clarity: At the same time, documents should be written with great clarity and should match our level of understanding; a tourist searching for the term Panther might not need the same type of information as a biologist searching for Panthera onca.

Novelty: New content is better than old, outdated documents.

Diversity: A list of search results is most compelling when it includes all possible meanings of the search terms, and contains different views and approaches to the subject. A classic example is the word “jaguar;” if no additional information is given, the search results should contain articles about both the animal and the car.

9 thoughts on “Intro to Search: Initial Considerations”

captnmike says:

August 29, 2016 at 4:21 pm

I am sort of old fashioned and like to print things out to read later – why not have the print option for us old fashioned folks?

printing WordPress.com Posts without the Print option can really suck and waste paper

thanks

LikeLike

1. Sirin Odrowski says:
  
  August 30, 2016 at 8:27 am
  
  Thanks for your suggestion and your interest in the post! The print button usually does not get a lot of traffic, but it is certainly something we can consider bringing onto the site.
  
  LikeLike
  
Demet Dagdelen says:

August 29, 2016 at 6:25 pm

Reblogged this on stuff.

LikeLiked by 1 person

Home's Cool! says:

August 30, 2016 at 1:38 am

I would love if I could search out sites by the theme they happen to use. Any chance that could happen? 🙂

LikeLiked by 2 people

Noel Williams ...www.photopincher.com...www.gospelmuse.com says:

August 30, 2016 at 10:32 am

Awesome! I applaud you guys for your selflessness and dedication. But most important, your willingness to share quality information for free. I have been around here for about four years now, and I have never felt that I was less important than anyone else.

LikeLiked by 1 person

3danim8 (aka Ken Black) says:

September 9, 2016 at 3:21 pm

Hi Sirin,

I love the WordPress-inspired word cloud. That is awesome.

Since we are co-conspirators in the world of data science, I have a feeling an article I wrote today might interest you. Here is the link: https://3danim8.wordpress.com/2016/09/09/should-you-believe-your-website-traffic-data/

I also would like to know if you know of any techniques for crawling the entire history of a blog to develop text-based analytics, much like you show in your word cloud. I have over 200 articles I have written on data science problem solving techniques and I have been thinking about how to develop a master, searchable database for my content. I was going to use IBM Watson to do it, but my company doesn’t have the ability to crawl internet sites, only pdf files. I don’t want to take the time to create pdf’s of all my articles!

Thanks much,

Ken

LikeLike

1. Sirin Odrowski says:
  
  September 23, 2016 at 11:35 pm
  
  For general websites, I’ve used scrapy in the past: https://scrapy.org/. It was quite easy to set up and configure in Python.
  
  If you’re looking for posts on your own blog here on WordPress.com, I think you could also try using the public API: https://developer.wordpress.com/docs/api/getting-started/.
  
  LikeLike
  
Sarah R says:

September 9, 2016 at 4:40 pm

The word cloud for searches is really eye-opening. I am impressed by the search habits of WordPress users!

LikeLike

dbp49 says:

September 9, 2016 at 10:14 pm

Sounds like I may have to add a new place to my list of favorite spots to visit. Guess I shouldn’t be surprised. In the three years that I’ve been traveling with your wonderful team, you have seldom failed to keep it real, and you have never failed to keep it interesting.

LikeLike