Is Cloudera Developer Training for Apache Hadoop worth it?
I just finished the Cloudera Developer Training for Apache Hadoop course, and passed the Cloudera Certified Developer for Apache Hadoop exam. I am feeling good about passing the certification exam on the first try, but have some mixed feelings, primarily around: is it worth the course fee (upwards of $2,700 at the time of the writing)? In particular, what is the value to job seekers or professionals wanting to augment their skill set and polish their resumes? Is it worth it to Java developers to take this course? Don’t get me wrong, the instructor (Mark Fei) was excellent! He was very knowledgeable and engaging. The course itself covers a lot of material in a digestible way. The question in my mind is, will you appreciate all the knowledge in taking the Developer course, or would picking up a book on Hive (or Pig) be more than enough for what you need to do?
For most users, I think this class might be over-kill. Being intellectually curious and passionate about Big Data, Hadoop, and learning new technologies, I looked forward to getting into the intimate details of Hadoop, the Map Reduce algorithm, and examples. A lot of this information is already available on the web, for free, at Cloudera, Hadoop and other sites. I knew I shouldn’t kid myself into thinking I would be able to carve enough time out of my busy day at work to get up to speed on this, so I signed up for the course. True for any training or professional development course, setting aside a block of time to take a course from a knowledgeable instructor is invaluable. I had bought Hadoop, the Definitive Guide a few months back, but it has been collecting dust for a while. (I did read the introductory chapters which was useful background for the course). The course gives you a good overview of the entire Hadoop ecosystem, as well as goes over a few key examples with hands-on labs to help you get comfortable with the material. It is good (I’d say pretty important) to be familiar with object oriented language concepts, otherwise the class may be over your head (One guy without Java background just picked up left, he was totally lost; other Oracle (SQL) guys were up for the challenge and loved the Hive section). After four days, I walked away with a lot of knowledge and insights that I would have not necessarily received from just reading materials online, and most likely would have taken me much longer than 4 days to complete, if at all. I have better ideas on how to re-architect the ETL process for my client’s analytics warehouse, and improve their production environment, too.
Hadoop is very hot in the job market right now (in New York, at least). As an anecdote, I have met many recruiters and head hunters who are looking for experienced Hadoop developers on behalf of their clients. (Hadoop + Python seems particularly popular). Does it make sense for job seekers in the Big Data, analytics, or BI field to get certified? I think that’s an open question. Real world experience beats certification any day of the week, I believe, but for those wanting to get up to speed quickly, the Cloudera course definitely helps you do that, and gives you a framework (i.e. developer certification) for claiming that you are reasonably knowledgeable. Being able to ask questions and engage in conversation during the course with a good instructor is very helpful and worthwhile, especially if you have real-world problems or concerns you are trying to address.
As far as Hive is concerned, anyone who writes SQL by hand regularly would have no problem getting up to speed in Hive (i.e., in a few hours, if that). Hive lets you write SQL-like statements and generates map reduce jobs automatically for you in the background (no programming required). So buying a book on Hive and reading that would probably be enough for a getting up to speed and being qualified for a data analyst role where Hadoop / Hive was being used. (I.e., no need to take the Developer course. Cloudera offers a course on Pig and Hive, but I have not taken it and cannot comment whether it is worth it. There is no “certification” for Hive or Pig at this point in time). Pig has its own language (PigLatin), so the learning curve may be slightly steeper (but not by much) for anyone with a good scripting language background (Perl, Python). I should mention Hive really only works well with text data; for binary based data (i.e. image analysis), then you really will need Java (or Python via Hadoop Streaming) knowledge of the map reduce algorithm.
Overall, I enjoyed the course, and am generally positive about it. I am more knowledgeable and can speak more confidently about Hadoop when speaking with tech professionals, senior management, and clients. I have a good sense of what Hadoop can and cannot do. Is it worth the price tag? My immediate reaction is “yes” if you plan on doing some advanced Hadoop work, need to know the framework well, or will be consulting to tech professionals and senior management on data architecture and IT strategy issues. If you work primarily with text data, then you can get by with SQL-like querying (through Hive) for the problems you are trying to solve, then it is probably not worth it; just learn Hive (which would be almost effortless for SQL professionals) or Pig from a book instead.