If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Google Books Ngram Viewer Cheat Sheet

Page history last edited by Alan Liu 9 years, 6 months ago

Cheat Sheet of Parameters That Can Be Set for the Ngram Viewer

(excerpted and quoted with adaptation from About Google Ngram Viewer)

Wildcard search ("search phrase*")

When you put a * in place of a word, the Ngram Viewer will display the top ten substitutions. For instance, to find the most popular words following "University of", search for "University of *".

Inflection search ("search phrase_INF")

An inflection is the modification of a word to represent various grammatical categories such as aspect, case, gender, mood, number, person, tense and voice. You can search for them by appending _INF to an ngram. For instance, searching "book_INF a hotel" will display results for "book", "booked", "books", and "booking":

Part-of-speech Tags

("searchword_Verb")
("searchword_Noun")
("searchword_ADJ") adjective
("searchword_ADV") adverb
("searchword_PRON) pronoun
("searchword_DET) determiner or article
("searchword_ADP) an adposition: either a preposition or a postposition
("searchword_NUM) numeral
("searchword_CONJ) conjunction
("searchword_PRT) particle
("searchword_ROOT) root of the parse tree These tags must stand alone (e.g., _START_)

Example: Consider the word tackle, which can be a verb ("tackle the problem") or a noun ("fishing tackle"). You can distinguish between these different forms by appending _VERB or _NOUN: , etc.
Most frequent part-of-speech tags for a word can be retrieved with the wildcard functionality. For example: query cook_*:

Stand-alone usage of Part-of-speech tags (above tag used in the format "_tags_")

For example, you can use the DET tag to search for "read a book," "read the book", "read that book," "read this book," and so on

Start and End of Sentences ("_START_") ("_END_")

The Ngram Viewer tags sentence boundaries, allowing you to identify ngrams at starts and ends of sentences with the START and END tags, for example: "_START_ President Lincoln")

Dependency Relations ("mainword=>dependentword")

Sometimes it helps to think about words in terms of dependencies rather than patterns. Let's say you want to know how often tasty modifies dessert. That is, you want to tally mentions of tasty frozen dessert, crunchy, tasty dessert, tasty yet expensive dessert, and all the other instances in which the word tasty is applied to dessert. For that, the Ngram Viewer provides dependency relations with the => operator:\.

Root Word in Sentence ("_ROOT_=>searchword")

Every parsed sentence has a _ROOT_. Unlike other tags, _ROOT_ doesn't stand for a particular word or position in the sentence. It's the root of the parse tree constructed by analyzing the syntax; you can think of it as a placeholder for what the main verb of the sentence is modifying. So here's how to identify how often will was the main verb of a sentence: "_ROOT_=>will". This will return results in which "will" is part of the sentence Larry will decide. but not Larry said that he will decide, since will isn't the main verb of the latter sentence.

Ngram Compositions

The Ngram Viewer provides five operators that you can use to combine ngrams: +, -, /, *, and :.

`+`	sums the expressions on either side, letting you combine multiple ngram time series into one.
`-`	subtracts the expression on the right from the expression on the left, giving you a way to measure one ngram relative to another. Because users often want to search for hyphenated phrases, put spaces on either side of the `-` sign.
`/`	divides the expression on the left by the expression on the right, which is useful for isolating the behavior of an ngram with respect to another.
`*`	multiplies the expression on the left by the number on the right, making it easier to compare ngrams of very different frequencies. (Be sure to enclose the entire ngram in parentheses so that * isn't interpreted as a wildcard.)
`:`	applies the ngram on the left to the corpus on the right, allowing you to compare ngrams across different corpora.

Searching inside Google Books

Below the graph, we show "interesting" year ranges for your query terms. Clicking on those will submit your query directly to Google Books. Note that the Ngram Viewer is case-sensitive, but Google Books search results are not.

Corpus Selection ["(searchworld:eng_2012)", "(searchword:fre_2012)", etc.]

The : corpus selection operator lets you compare ngrams in different languages, or American versus British English (or fiction), or between the 2009 and 2012 versions of our book scans. Here's chat in English versus the same unigram in French: (chat:eng_2012) versys (chat:fre_2012)
Corpora: Below are descriptions of the corpora that can be searched with the Google Books Ngram Viewer. All corpora were generated in either July 2009 or July 2012; we will update these corpora as our book scanning continues, and the updated versions will have distinct persistent identifiers. Books with low OCR quality and serials were excluded.

Informal corpus name	Shorthand	Persistent identifier	Description
American English 2012	eng_us_2012	googlebooks-eng-us-all-20120701	Books predominantly in the English language that were published in the United States.
American English 2009	eng_us_2009	googlebooks-eng-us-all-20090715
British English 2012	eng_gb_2012	googlebooks-eng-gb-all-20120701	Books predominantly in the English language that were published in Great Britain.
British English 2009	eng_gb_2009	googlebooks-eng-gb-all-20090715
Chinese 2012	chi_sim_2012	googlebooks-chi-sim-all-20120701	Books predominantly in simplified Chinese script.
Chinese 2009	chi_sim_2009	googlebooks-chi-sim-all-20090715	Books predominantly in simplified Chinese script.
English 2012	eng_2012	googlebooks-eng-all-20120701	Books predominantly in the English language published in any country.
English 2009	eng_2009	googlebooks-eng-all-20090715
English Fiction 2012	eng_fiction_2012	googlebooks-eng-fiction-all-20120701	Books predominantly in the English language that a library or publisher identified as fiction.
English Fiction 2009	eng_fiction_2009	googlebooks-eng-fiction-all-20090715
English One Million	eng_1m_2009	googlebooks-eng-1M-20090715	The "Google Million". All are in English with dates ranging from 1500 to 2008. No more than about 6000 books were chosen from any one year, which means that all of the scanned books from early years are present, and books from later years are randomly sampled. The random samplings reflect the subject distributions for the year (so there are more computer books in 2000 than 1980).
French 2012	fre_2012	googlebooks-fre-all-20120701	Books predominantly in the French language.
French 2009	fre_2009	googlebooks-fre-all-20090715	Books predominantly in the French language.
German 2012	ger_2012	googlebooks-ger-all-20120701	Books predominantly in the German language.
German 2009	ger_2009	googlebooks-ger-all-20090715	Books predominantly in the German language.
Hebrew 2012	heb_2012	googlebooks-heb-all-20120701	Books predominantly in the Hebrew language.
Hebrew 2009	heb_2009	googlebooks-heb-all-20090715	Books predominantly in the Hebrew language.
Spanish 2012	spa_2012	googlebooks-spa-all-20120701	Books predominantly in the Spanish language.
Spanish 2009	spa_2009	googlebooks-spa-all-20090715	Books predominantly in the Spanish language.
Russian 2012	rus_2012	googlebooks-rus-all-20120701	Books predominantly in the Russian language.
Russian 2009	rus_2009	googlebooks-rus-all-20090715	Books predominantly in the Russian language.
Italian 2012	ita_2012	googlebooks-ita-all-20120701	Books predominantly in the Italian language.