- There are more than 60 Trillion individual Web Pages.
- Google navigates these pages by crawling.
- Once it crawls the pages it adds them to a massive index where the URL of page is stored with URL content as key.
- Now when you go and search on Google, it uses programs based on complicated algorithms to generate and produce search result.
- While ranking the URLs, Google considers more than 200 factors in consideration some of them are freshness, keyword density, site quality, relevance, synonyms etc.
- According to some estimate about 60% of internet traffic is non human and about 50% of this non human traffic is malicious. Also there are lots of malicious site which are intent on harming users. Google constantly works on blacklisting these pages and prevent spamming.
- For doing all these tasks, Google uses a total of 1 million servers approximately.
These are impressive stats. Let us suppose that you want to have a Google like search feature on your site. But instead of searching entire internet, you want to generate results only from a selected group of sites.
Lets say I want to create a search feature which indexes and searches only on Java Blogs. There are many tools and APIs available in different language that allows us to crawl, index and provide us the desired search result. In other words you can create your own search engine. But would your search engine be as available and robust as Google’s? Answer is, perhaps, but it it will take lots of resources and time to do so.
There is another approach and Google provides the alternative itself.
Google Custom Search API
Google Custom Search enables you to create a search engine for your website, your blog, or a collection of websites. You can configure your search engine to search both web pages and images. You can fine-tune the ranking, customize the look and feel of the search results, and invite your friends or trusted users to help you build your custom search engine. (Taken from google API tutorial doc).
In this post we will walk through how to create a custom Google search engine for a website and have some fun in turn. Without any further ado let us jump to the to-do steps.
As first step you need to actually create a search engine. There are two varieties
- Custom Google Search – This is free and can be accessed via https://www.google.com/cse/all
- Google Site Search – This is a more powerful and paid version of google search. The details can be found here. here http://www.google.com/enterprise/search/products/gss.html
For this tutorial we will use the free one and then we will try using api to create our own Custom search using it. Steps are below.
Step 1 – Create a search engine.
- For this visit the following link https://www.google.com/cse/manage/all
- Click on Add button. This will open create Custom Google Search Page
- First enter all the urls you wish to index and search against. You can choose as many as you want. I mostly choose java blogs like dzone, ibm, theserverside etc.
- Now enter the name of your search engine and save it.
- Now your search engine is created and you can go there and play with your searches. Here is mine for your referral Weblog4j Search. Search for java and software related terms and you will get awesome results. You can customize the search page to some extent.
Steps 2 – Get the search engine id.
- Go to https://www.google.com/cse/manage/all.
- Click on the search engine you created.
- Go to Basic tab and find details. There is a button “Search Engine Id”. Click on the button and you will get the search engine id in the pop up. Save the id in a text file for later reference.
Step 3 – Getting the API key.
- Playing with json/atom custom search API requires an API key.
- Go to Google Cloud Console.
- Create a new project and activate it. Now the project will be listed on console page. Click on the project created.
- On the page look at the left hand bar. Go to APIs and Auth -> API. This page list Google cloud APIs. Find “Custom Search API” from the list and toggle it to ON.
- Now click on Credentials. You will find a Panel “Public API access”. Click on “Create New Key” button. On resultant popup click on Server Key.
- A new pop up opens. It will ask for IP which should be permitted to use the key. You can leave it empty and create the key.
- Copy and save the key at a secret location.
Google Custom Search API
So Now we have the search engine to play with, search engine id and API key and we are ready to get hands dirty with some code. API Overview.
- It is a REST api with a single method called list.
- The API method is GET.
- The response data is returned as a JSON or ATOM type.
- The response consists of 1. Actual search result 2. Metadata for search like number of results, alternative search queries 3.Custom search engine metadata.
- The data model depends on OpenSearch 1.1 specification.
API URL – The rest url to invoke google custom search is
https://www.googleapis.com/customsearch/v1?parameters
Parameters
- key – API key you saved in step 3 above
- cx – custom search engine id you got in step 2. In case of linked custom search engine use cref instead of cx
- q – the search engine query
For a complete reference of query parameter visit the following page https://developers.google.com/custom-search/json-api/v1/reference/cse/list.
Libraries and Dependencies
Since we are dealing with a third party API all we need is to make an ajax call from our front end and get resultant JSON response. But there are libraries in multiple languages available which makes the working with Google search API a breeze. For java we have following
You can download the search API library from here. Or you can use the following maven dependencies.
<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-customsearch</artifactId>
<version>v1-rev40-1.18.0-rc</version>
</dependency>
<dependency>
<groupId>com.google.http-client</groupId>
<artifactId>google-http-client-jackson</artifactId>
<version>1.15.0-rc</version>
</dependency>
Code for searching
We can use the normal rest APIs as there is nothing special about invoking Google custom search service url. But for this tutorial we will be using the Google Client API for searching on our result set. Let check out the code.
package com.aranin.spring.googleapi.search;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson.JacksonFactory;
import com.google.api.services.customsearch.Customsearch;
import com.google.api.services.customsearch.model.Result;
import com.google.api.services.customsearch.model.Search;
import java.util.List;
/**
* Created by IntelliJ IDEA.
* User: Niraj Singh
* Date: 6/3/14
* Time: 12:42 PM
* To change this template use File | Settings | File Templates.
*/
public class GoogleSearchClient {
final private String GOOGLE_SEARCH_URL = "https://www.googleapis.com/customsearch/v1?";
//api key
final private String API_KEY = "your api key from step 3";
//custom search engine ID
final private String SEARCH_ENGINE_ID = "your search engine id from step 2";
final private String FINAL_URL= GOOGLE_SEARCH_URL + "key=" + API_KEY + "&cx=" + SEARCH_ENGINE_ID;
public static void main(String[] args){
GoogleSearchClient gsc = new GoogleSearchClient();
String searchKeyWord = "weblog4j";
List<Result> resultList = gsc.getSearchResult(searchKeyWord);
if(resultList != null && resultList.size() > 0){
for(Result result: resultList){
System.out.println(result.getHtmlTitle());
System.out.println(result.getFormattedUrl());
//System.out.println(result.getHtmlSnippet());
System.out.println("----------------------------------------");
}
}
}
public List<Result> getSearchResult(String keyword){
// Set up the HTTP transport and JSON factory
HttpTransport httpTransport = new NetHttpTransport();
JsonFactory jsonFactory = new JacksonFactory();
//HttpRequestInitializer initializer = (HttpRequestInitializer)new CommonGoogleClientRequestInitializer(API_KEY);
Customsearch customsearch = new Customsearch(httpTransport, jsonFactory,null);
List<Result> resultList = null;
try {
Customsearch.Cse.List list = customsearch.cse().list(keyword);
list.setKey(API_KEY);
list.setCx(SEARCH_ENGINE_ID);
//num results per page
//list.setNum(2L);
//for pagination
list.setStart(10L);
Search results = list.execute();
resultList = results.getItems();
}catch (Exception e) {
e.printStackTrace();
}
return resultList;
}
}
The code is very simple. We create a simple Customsearch object. From this object we get an Instance of CSE object which represents a search engine. Now we set our search engine properties like API Key, search engine id etc in the CSE. Then we go on to set the search criteria like search term, number of result, starting point (for pagination) for our resultset. Then execute it.
Finally we get a List of result objects. This contains all the search data and can be used to create our own search engine.
That is all folks. I hope you find this tutorial useful. Feel free to drop in couple of comments in case you like/dislike the post.
References
- https://www.google.co.in/insidesearch/howsearchworks/thestory/
- http://atkinsbookshelf.wordpress.com/tag/how-many-servers-does-google-have/
- http://www.google.co.in/about/datacenters/
- https://developers.google.com/custom-search/json-api/v1/overview
- https://developers.google.com/custom-search/docs/tutorial/creatingcse
- https://www.google.com/cse/manage/all
- https://cloud.google.com/console
- https://developers.google.com/custom-search/json-api/v1/introduction
Customsearch