Agile, Scrum, Kanban, Architecture, ...: HiveQL vs SQL

Saturday, June 14, 2014

HiveQL vs SQL

Scenario/Feature	HiveQL	SQL	Remarks
Default Join	"equi" join	Inner join	"equi" join - the only entries that are returned are the ones where the condition is true and returns no null values
Join syntax	LEFT OUTER JOIN RIGHT OUTER JOIN	LEFT JOIN RIGHT JOIN
Largest table last	Hive attempts to perform a map-side join where it loads the first table into memory and reads the second table in as normal input to the map function		When writing queries, try to facilitate this as much as possible and order the tables used in the join so that the largest table is last.
Data Type	No interval types
	All queries must reference a table	'dual' or table-less queries supported
	No session-scoped temp tables
	No 'IN' predicate
	No 'FIND' string search function for producing the offset to a match
	No find/replace string functions for plain strings (i.e. not regex)
	No regular UNION, INTERSECT, or MINUS operators
	Null values are treated differently than empty string, and are exported differently. IE, empty strings are exported as '\n' and nulls are exported as nulls		This isn't unique to Hive but still annoying when exporting data from Hive into another system.
	No hierarchical/self-referencing querying		Most distributed computing solutions can't do this, but it can be very handy.
	No Update or Delete statements
	No cost-based explain plans.		Running explain plans generally just shows the path of accessing data. Useful to some degree but it would be great if it was more advanced in that it could help the user understand which steps are causing the biggest slowdowns
	Hive Does not support the ability to run a query that select from tables in more than one database	It is possible
	Hive does not support sub-queries such as those connected by IN/EXISTS in the WHERE clause
	Hive does not support the truncation of data from a table
	No inequality join
	group_concat () is missing in Hive QL		it is available with Impala

No comments:

Post a Comment

Disclaimer & Copyright

The entries in my blog are solely my opinions and do not represent the thoughts, intentions, plans or strategies of any third party, including my employer, except where explicitly stated. Needless to say, a weblog is a snapshot in time. Over time, as I interact with the community at large and/or learn more about various topics, my thoughts and opinions are subject to change. As such you should not consider out of date posts to reflect my current thoughts and opinions. Java, Oracle, Orcle Fusion Middleware, TIBCO, Sun, Microsoft, IBM, WebSphere, SAP, NetWeaver, Cloudera, HortonWorks and any other mentioned are trade marks of respective owners. © Copyright 2001-2015, Tushar Jain

Agile, Scrum, Kanban, Architecture, ...

Saturday, June 14, 2014

HiveQL vs SQL

No comments:

Post a Comment

Followers

Add to Technorati Favorites

My Docs

Blog Archive

Contributors

My Blog List

Disclaimer & Copyright

Agile, Scrum, Kanban, Architecture, ...

Saturday, June 14, 2014

HiveQL vs SQL

No comments:

Post a Comment

Subscribe To SOA Blog

Followers

Add to Technorati Favorites

My Docs

Blog Archive

Contributors

My Blog List

Disclaimer & Copyright