Source: http://www.cascading.org
Cascading is a proven application development platform for building Data applications on Apache Hadoop. Whether solving simple or complex data problems, Cascading balances an optimal level of abstraction with the necessary degrees of freedom through a computation engine, systems integration framework, data processing and scheduling capabilities.
Java API
Cascading is a Java library and does not require installation. Cascading fits directly into a standard development process, and you don’t have to do anything extra except use APIs.
Data Processing APIThe data processing APIs define data processing flows. The APIs exposed provide a rich set of capabilities that allow you to think in terms of the data and the business problem such as sort, average, filter, merge etc.
Data Integration APIThe data integration API allows you to isolate your integration dependencies from your business logic. You can easily read/write from a variety of external systems to Hadoop, and then write those results to another system.
Scheduler API
Scheduler APIs can schedule work from 3rd party applications. The Process Scheduler coupled with the Riffle life-cycle annotations allows Cascading to schedule unit of work from any third-party application.
Process Planner
Cascading’s physical planner automatically creates MapReduce jobs ready for processing on your cluster.
Taps and Schemes
Taps and Schemes enable read/write capabilities between any source and in any format. Cascading comes with several pre-built taps and schemes and also provides you the flexibility to quickly build your own.
Standard Relational OperationsMany common operations used in relational environments such as regular expression operations, Java expression operations, XML operations and logical filter operations are available in Cascading.
Scriptable InterfaceAny Java-compatible scripting language can import and instantiate Cascading classes, create pipe assemblies and flows, and execute those flows. Users can also create their own DSLs to handle common idioms.
Local mode / In-Memory modeOn a single node, Cascading’s local mode can be used to efficiently test code and process local files before being deployed on a cluster. The built-in testability allows debugging before production deployment.
Dynamic Programming Languages
The Cascading community has built dynamic programming languages on top of the Java API for greater productivity. There are several to choose from: Lingual (ANSI SQL), Pattern (PMML), Scalding (Scala), Cascalog (Clojure) and more!
Hadoop Support
Cascading runs on all popular Hadoop distributions and Hadoop-as-a-service providers. We ensure that Cascading can run on-premise or in the cloud to meet your deployment needs.
Use Cases
BUILD MISSION-CRITICAL APPLICATIONS WITH CASCADING
Enterprise IT
Extract Transform Load
Log File Analysis
Systems Integration
Operations Analysis
Corporate Apps
HR Analytics
Employee Behavioral Analysis
Customer Support | eCRM
Business Reporting
Telecom
Data processing of Open Data
Geospatial Indexing
Consumer Mobile Apps
Location based services
Marketing / Retail
Mobile, Social, Search Analytics
Funnel analysis
Revenue attribution
Customer experiments
Ad Optimization
Retail recommenders
Consumer / Entertainment
Music Recommendation
Comparison Shopping
Restaurant Rankings
Real Estate
Rental Listings
Travel Search & Forecast
Finance
Fraud and Anomaly Detection
Fraud Experiments
Customer Analytics
Insurance Risk Metric
Health / Biotech
Aggregate metrics for Govt
Person biometrics
Veterinary diagnostics
Next-Gen Genomics
Argonomics
Environmental Maps
URLs
http://www.cascading.org/projects/cascading/
http://www.cascading.org/use-cases/