Skip to main content

API - stands for Application Programming Interface. In the context of APIs, the word Application refers to any software with a distinct function. Interface can be thought of as a contract of service between two applications. This contract defines how the two communicate with each other using requests and responses.

JSON - stands for JavaScript Object Notation. JSON is a lightweight format for storing and transporting data. JSON is often used when data is sent from a server to a web page.

Metadata - data that provides information about other data. Metadata summarizes basic information about data, making finding & working with particular instances of data easier. Metadata can be created manually to be more accurate, or automatically and contain more basic information

Command LineA text-based user interface to the computer. The command line is a blank line and cursor on the screen, allowing the user to type in instructions for immediate execution. All major operating systems (Windows, Mac, Unix, Linux, etc.) support command lines that programmers and power users can employ to perform file management operations directly and often more efficiently than by using a graphical user interface (GUI). After typing a command, it is executed by pressing the Enter key.

ECCO-TCPThe PhiloLogic ECCO-TCP database is comprised of 2,387 works in English published during the 18th century. It contains over 75 million words and 457,513 unique word forms. Full text provided by the Text Creation Partnership and the University of Oxford Text Archive.The PhiloLogic ECCO-TCP database is comprised of 2,387 works in English published during the 18th century. It contains over 75 million words and 457,513 unique word forms. Full text provided by the Text Creation Partnership and the University of Oxford Text Archive.

Web Scraping - refers to the extraction of data from a website. This information is collected and then exported into a format that is more useful for the user. Be it a spreadsheet or an API.

POS - Part-of-Speech tagging aims on identifying which grammatical group a word belongs to, so whether it is a NOUN, ADJECTIVE, VERB, ADVERBS etc. based on the context. This means it looks for relationships within the sentence and gives each word in a sentence the corresponding tag.

NER - Named Entity Recognition on the other hand tries to find out whether or not a word is a named entity. Named entities are persons, locations, organizations, time expressions etc. This problem can be broken down into detection of names followed by classification of name into the corresponding categories.

Topic Modelling - an unsupervised machine learning technique that’s capable of scanning a set of documents, detecting word and phrase patterns within them, and automatically clustering word groups and similar expressions that best characterize a set of documents.

LDA - Linear discriminant analysis (LDA) is a type of linear combination, a mathematical process using various data items and applying functions to that set to separately analyze multiple classes of objects or items. 

Word Vector - an attempt to mathematically represent the meaning of a word. In essence, a computer goes through some text (ideally a lot of text) and calculates how often words show up next to each other. These frequencies are represented with numbers.