Search was revolutionized by Google. Before Google, we had Yahoo, AltaVista, AskJeeves / Ask.com, and more (full history here) – most used indexing, clustering etc. to produce search results. Searches were slow, inaccurate, queries weren’t parsed very well, etc.
Then Google came along using PageRank, and stole the whole market. PageRank, was then copied industry wide, but not before Google had it’s market share.
That’s why most of this article will be about one company, Google is “search”. However, we will also be generalizing to an industry as a whole, as all major search engines now use similar methods. The real question here, is search as good as it’s going to get? Or, rather – is it practical to get better.
Can We Do Better?
That’s an interesting question – can we do better? Googles core competency is Search, or at least it was. To some extent I feel they have transitioned into being everyones platform for online interaction. Android, GMail, Google Drive, Nexus, Google Fiber, etc. Their goal is for you to live on their ecosystem and never leave. That should improve their search, but has it?
Honestly, I’m going to say no. Of course, that’s just one man’s opinion, but for me Google searches are becoming increasingly less useful. I often have to go through pages upon pages to find something I’m looking for, and even then I often end up just giving up. More recently, I’ve started using DuckDuckGo for many of my searches – as the searches have been comparable, if not improved. (at least for what I search for)
There are a few possibilities that could cause Googles decline (stipulating the results are worse), here are several I’ve been considering:
- I’ve slowly started searching super niche topics, where there is literally nothing on the internet
- My search-foo has become worse over time
- The internet is expanding at an ever increasing rate, and PageRank is failing
- SEO (Search Engine Optimization) has become so ubiquitous that Google is essentially unable to keep up
- Google has become incentivized to give poorer results
Now, I’ve spent days reviewing this list. I’ve also spent a bit reviewing my searches, which you can do yourself here. It appears that my searches haven’t changed much. Since 2013 around 50% of my searches are related to programing, 25% are some historical event, and the rest are odds and ends (perhaps I’ll do another post on this…).
Anyway, this leads me to believe that the issue, if one exists at all, is likely due to #3, #4, #5, or some combination there of.
The Structure of Search Queries
What I discovered, were there were only a few very distinct ways I search.
I’m going to classify them as three distinct kinds of searches:
- Solve problem A using X (fuzzy multi-entity search)
- Who, What, Where, When, How is X (fuzzy single entity search)
- Exact text match (exact multi-entity search)
- Notable mention: misspelled words, looking for proper spelling…
The Solve problem A using X is probably the most common in my repertoire of searches. It’s distinct from the Who, What, Where, When, How is X searches, because it often isn’t a structured english phrase, nor is it often searching for how to do X or what is X.
There are many more, but you get the point. Then, there is the Who, What, Where, When, How is X – these are often implied, but are fairly clear:
These type of searches are nearly as prevalent as the Solve problem A using X queries, and are often more general covering the vast majority of topics. The final type of search is more common for programming (for me) and is an Exact text match:
The Exact text match is close to tied for my most common search. The search often occurs when debugging programs, searching for a quote, a song, a stock, a story, etc.
Why Those Structures?
The more puzzling question for me, is why only those three given structures? Was it the old search interfaces such as Yahoo, AltaVista, AskJeeves / Ask.com, which taught me to search that way through trial and error? Was it Google, who had a monopoly on my search the past decade? Or, perhaps this is just he way our brains work.
Personally, I’m leaning toward the latter. It seems obvious why we search the way we do. Who, What, Where, When, Why, How, etc. those are the normal way we ask questions; it makes since we’d search that way on the internet. Similarly, Exact matching and Solve Problem A with X seem obvious, as we know what we are looking for – just don’t know what web page has it.
Upon further inspection, it appears that those three categories can really be boiled down to two categories technical categories:
- Fuzzy multi-entity (or single) search
- Exact multi-entity (or single) search
That covers every type of search I can think of. So the queries themselves are likely fixed. I.E. we all probably search similar and Google likely covers the vast majority if not all of the types of constructed searches. Many older search engines couldn’t handle those types of queries, which leads me to I personally feel Google is winning/won the search engine “war”:
Googles largest advantage is likely their ability to properly parse and construct queries.
Obviously, their ranking algorithm (PageRank) did and does provide great results. However, I postulate that it’s the user experience of properly structuring queries (enabled by the algorithm) that is Googles real advantage.
If Yahoo for instance, had implemented the same user experience – they might well be on top, even with a worse algorithm. I can search on Google using any type or combination of words, and usually within three searches (using different combinations) I have half decent to good results. With the old competitors, I used to have to constantly retype to and wait patiently; many times never receiving good results – no matter what I typed in.
Speed + ease of use + accuracy, provides Google a huge advantage in business, but have they built something that can’t be beat?
PageRank at the time was revolutionary. However, to me (and I’m sure to others) the idea seems obvious. PageRank utilizes Backlinks, where a link to a page essentially counts as a “citation”. Essentially, this is the same way scientific journals/papers are ranked. As with most things in life, it was the execution / algorithm on the other hand was revolutionary component, not the idea.
Without getting too far into the weeds, PageRank is a way to weight a webpage based on the linking of other known webpages with weights. Essentially, ranking websites higher in searches, the more other websites wight higher weights link to them. If that sounds complicated, here’s a nice image from the wikipedia page:
Making that work is some hard stuff. At the time, other companies were using clustering, indexing and other methods, but no one was actually ranking all the webpages, at scale, using the network of the world wide web.
This is what made Googles entry to the market possible. It enabled search queries to be free form, and rank pages based on key terms, and now (with mobile and their other services) they consider context, load speed of website.
What Google Uses Today
According to Google they still use PageRank, but they also use a list of other categories to provide the “best” results. This includes accurately parsing your terms, load speed of webpages, relevance of topic(s), anchor links, so on and so forth.
One thing to note, is that PageRank while still relevant, Google has increasingly moved towards more of a “user-experience” ranking as well. To me, this implies they value not only their content providing an answer (not necessarily the best), they are also looking to provide the best user experience. Obviously, webpages sometimes be at contrasting ends of the spectrum (horrid answer, great user experience).
This actually leads me to the final portion of the discussion. Is search solved? The answer is no.
Problems With Search
What Google is showing us with the directional change in their search algorithm, and their efforts to keep us on their platform. Those are the cracks in Googles armor.
Lets summarize what I view as the primary issue(s) with search today.
#1 PageRanks Effectiveness has Eroded
PageRanks effectiveness appears to have diminished, or at least coalesced around a few websites. Wikipedia is almost always the top search result, along with any social websites. Let’s take a look at when I searched “Iraq War” on Google:
The top search result is wikipedia, followed by.. top stories (some of which are, maybe interesting?) Then we have other search results, like when I searched “Target” on Google:
Google knows I’m in Champaign (that locality information), but the links are from Yelp, Twitter and Wikipedia (again). PageRank was designed when the internet was much smaller, with a few high quality websites and a bunch of essentially smaller, blogs and niche websites – of varying, but potentially high quality. Now, everyone’s on the internet, and much of it is social – which is why all the top links are nearly always social media accounts or social sites.
Unless of course, you’re looking at niche topics; so lets check some of my blog posts:
A search for exact text from my post and my post is below a “how to install guide cURL” guide, and a bunch of additional “people also ask statements”. Why? Likely because the top result is from a domain with a higher weight. What is not too great, for a user searching. The subjectively wrong link is a featured story, and doesn’t have an exact text match, while the one below it does.
On the other hand, niche articles are also easier to rank higher in some cases. For instance, if you’re not competing with highly ranked base domains, as when I search “FOIA Request Universities”:
My articles have been ranked lower, even though they are way more cited, aren’t in PDF form, have photos, code, is easier to read, etc. Simply because the University of Illinois likely has so much weight given to their domain (especially given I’m in Champaign, IL).
Overall, it’s easy for my website to rank highly for a niche topic. However, as soon as anyone remotely higher up the food chain enters your key terms, you’re pushed out. This is in my opinion, is why single site blogging is going extinct outside of Medium and a few other platforms.
#2 Socialization, Centralization & Walled Gardens
I’ve Hinted at this previous, but Social Media has presented two huge problems:
- Most of social media is behind walled gardens, meaning links can’t be applied (easily) to Google’s search algorithms.
- With social media, people are less likely to share content outside of walled gardens. This leads to a further centralizing effect – meaning places like Stack Overflow always will out rank virtually all other websites for coding (for instance).
How do search engines handle these cases? They don’t to the best of my knowledge. My bet, is many search engines have crept into the walled gardens as best they can. This is part of the reason Google owns Gmail, Android, Chrome, Chromebooks, GSuite, Pixel, Nexus, etc. Every little bit of information they can use to gain a competitive advantage.
Publicly though, (if they do that) I highly doubt if they will ever admit it.
That being said, their results are still being centralized around a few websites. That is causing the internet ecosystem to centralize and making it brittle and also not producing the best search results.
#3 Lowering the Bar
When Search was first being developed it had the advantage of really only targeting people between 20 and 40, typically male, working a white collar job, with fair technical skills. Today, everyone uses search on a daily basis. In the early 2000’s and even through the early 2010’s that likely wasn’t too much of a problem; as the content producers were still primarily the same group: 20 – 40 years old, typically male, working a white collar job, with fair technical skills.
Then, as places such as Buzzfeed, Breitbart, and other places I wont link to you, started popping up or gaining in popularity – why?
Well, that’s when your Mother, Father, Uncle, Grandfather, the whole family, started joining the internet (I assume) via Facebook & mobile phones. Worse, they were active – i.e. they didn’t just research or read news on the inernet, they started interacting on the internet. Unfortunately, I couldn’t find any solid stats or analytics for this, but I believe we all know it’s true.The fact that 5 years ago my Grandfather and my Grandmother-in-law wasn’t commenting on Facebook is evidence enough for me.
This influx of new active internet users brought on new content procures to official channels (Buzzfeed, Breitbart, but also the NYT, WP, BBC, etc.), but these noobs to the internet started producing their own content.
Overall, the impact this has had – in my opinion – has been tremendous. Everything the prior generation learned on the internet is being relearned, scams and fake news is being up-voted, linked to, and ranking higher in search.
IMO this is the primary cause for the decline of search. Google now has every incentive to maximize advertising revenue, and they have easy pray. Plus, their underlying algorithms are likely having growing pains, if not outright breaking.
Is Search Solved? – No.
The answer is clear, search is not solved (for the general case). More specifically, I’d argue it didn’t scale and we have a long way to go.
We didn’t even get to the fact that niche industries such as finance or news events really need their own search engines or at least own algorithms (Google isn’t great here). In part, this is why I started MetaCortex and we are building ProjectPiglet.com, because it’s clear to us search is broken.
And honestly.. I think search is going to get worse, much worse.
Further centralization, more walled gardens, and Google will likely be at the forefront. What’s scary to me, is even if I have the best search algorithm in the world, it likely wont matter. It’s all about data and mindshare/marketshare. Google has it, Baidu has it. Search engines like DuckDuckGo can exist and maybe even thrive in their niche (privacy), but that’s it. Similarly, that is where we are positioning ProjectPiglet.com, as we can gather revenue in our niche (finance/cryptocurrency), without directly competing.
Unless Google royally drops the ball, it’s not going to lose it’s top spot any time soon.
Yet, on the same token, Google leaves much to be desired in terms of search and I do think it’s an unsolved (perhapse unsolvable) problem.