Cloudera Hadoop Developer Training- Is it Worth it?

Is Cloudera Developer Training for Apache Hadoop worth it?

I just finished the Cloudera Developer Training for Apache Hadoop course, and passed the Cloudera Certified Developer for Apache Hadoop exam. I am feeling good about passing the certification exam on the first try, but have some mixed feelings, primarily around: is it worth the course fee (upwards of $2,700 at the time of the writing)? In particular, what is the value to job seekers or professionals wanting to augment their skill set and polish their resumes? Is it worth it to Java developers to take this course? Don’t get me wrong, the instructor (Mark Fei) was excellent! He was very knowledgeable and engaging. The course itself covers a lot of material in a digestible way. The question in my mind is, will you appreciate all the knowledge in taking the Developer course, or would picking up a book on Hive (or Pig) be more than enough for what you need to do?

For most users, I think this class might be over-kill. Being intellectually curious and passionate about Big Data, Hadoop, and learning new technologies, I looked forward to getting into the intimate details of Hadoop, the Map Reduce algorithm, and examples. A lot of this information is already available on the web, for free, at Cloudera, Hadoop and other sites. I knew I shouldn’t kid myself into thinking I would be able to carve enough time out of my busy day at work to get up to speed on this, so I signed up for the course. True for any training or professional development course, setting aside a block of time to take a course from a knowledgeable instructor is invaluable. I had bought Hadoop, the Definitive Guide a few months back, but it has been collecting dust for a while. (I did read the introductory chapters which was useful background for the course). The course gives you a good overview of the entire Hadoop ecosystem, as well as goes over a few key examples with hands-on labs to help you get comfortable with the material. It is good (I’d say pretty important) to be familiar with object oriented language concepts, otherwise the class may be over your head (One guy without Java background just picked up left, he was totally lost; other Oracle (SQL) guys were up for the challenge and loved the Hive section). After four days, I walked away with a lot of knowledge and insights that I would have not necessarily received from just reading materials online, and most likely would have taken me much longer than 4 days to complete, if at all. I have better ideas on how to re-architect the ETL process for my client’s analytics warehouse, and improve their production environment, too.

Hadoop is very hot in the job market right now (in New York, at least). As an anecdote, I have met many recruiters and head hunters who are looking for experienced Hadoop developers on behalf of their clients. (Hadoop + Python seems particularly popular). Does it make sense for job seekers in the Big Data, analytics, or BI field to get certified? I think that’s an open question. Real world experience beats certification any day of the week, I believe, but for those wanting to get up to speed quickly, the Cloudera course definitely helps you do that, and gives you a framework (i.e. developer certification) for claiming that you are reasonably knowledgeable. Being able to ask questions and engage in conversation during the course with a good instructor is very helpful and worthwhile, especially if you have real-world problems or concerns you are trying to address.

As far as Hive is concerned, anyone who writes SQL by hand regularly would have no problem getting up to speed in Hive (i.e., in a few hours, if that). Hive lets you write SQL-like statements and generates map reduce jobs automatically for you in the background (no programming required). So buying a book on Hive and reading that would probably be enough for a getting up to speed and being qualified for a data analyst role where Hadoop / Hive was being used. (I.e., no need to take the Developer course. Cloudera offers a course on Pig and Hive, but I have not taken it and cannot comment whether it is worth it. There is no “certification” for Hive or Pig at this point in time). Pig has its own language (PigLatin), so the learning curve may be slightly steeper (but not by much) for anyone with a good scripting language background (Perl, Python). I should mention Hive really only works well with text data; for binary based data (i.e. image analysis), then you really will need Java (or Python via Hadoop Streaming) knowledge of the map reduce algorithm.

Overall, I enjoyed the course, and am generally positive about it. I am more knowledgeable and can speak more confidently about Hadoop when speaking with tech professionals, senior management, and clients. I have a good sense of what Hadoop can and cannot do. Is it worth the price tag? My immediate reaction is “yes” if you plan on doing some advanced Hadoop work, need to know the framework well, or will be consulting to tech professionals and senior management on data architecture and IT strategy issues. If you work primarily with text data, then you can get by with SQL-like querying (through Hive) for the problems you are trying to solve, then it is probably not worth it; just learn Hive (which would be almost effortless for SQL professionals) or Pig from a book instead.

19 thoughts on “Cloudera Hadoop Developer Training- Is it Worth it?

  1. Alpesh Patel November 8, 2012 / 1:30 am

    This is really really helpful, David – must thank you for writing this blog. I am from INDIA and planning to attend developer training in Dec-12, which will be in Bangalore. I’d also agree with you regarding the fees, it’s really high but still, I am thinking to invest in the training as the future of this technology seems bright.

    I have been working in Market Research field since last 8 years so I have knowledge about data, ETL process etc. Besides, I have knowledge about SQL/PLSQL, not at professional level but I did my grad and post-grad with specialization in DBMS. I also have knowledge about Java (object oriented concepts) but again it’s what I have learned during grad/post-grad. Do you think that would help me understand Hadoop better? Or should I read other Hadoop related material before I join the training?

    And, at the end, one important question – how the “Hadoop-training” is helping you getting the job:)?

    Like

  2. Kishore January 28, 2013 / 6:17 am

    Hi,

    Thanks for your valuable explanation. I am currently working as Mainframe developer and have a thought on shifting to Hadoop aside. Will the mainframe experience be considered or the prior experience will not be considered. Please let me know your thoughts as I have no idea on Hadoop but willing to shift.

    Thanks,
    Kishore.

    Like

  3. Rajesh January 31, 2013 / 6:18 pm

    Very helpful review

    Like

  4. Anish Goyal February 20, 2013 / 6:00 am

    Hey David, do I need to learn Java before getting certified in Hadoop?

    Like

  5. David June 13, 2013 / 11:59 pm

    Thanks for all the comments! I need to check my comments log more often! I can’t believe these have been sitting in the inbox all this time. my apologies!

    @Kishore, I’m not familiar w/ mainframe development, so can’t be too helpful. I think it makes sense to stay abreast of the latest developments and modern architectures (even if just as a passing interest/hobby), so I encourage you to read more about Hadoop and other MPP architectures. In general, strong Java and/or python skills are in demand, so I would probably get stronger in one of those if you want to switch over at some point. Good luck!

    @Anish, I think it makes sense to know Java before getting certified, but you can code python and still learn about Hadoop and get certified. I would ask what the goal is.. if you want a job, many employers are looking for java developers, and many others are looking for python + Hadoop Streaming API, so really depends on what your goal is and what the job requires. Getting certified won’t necessary get you a job.

    Like

  6. Ravish June 27, 2013 / 2:52 am

    Hi David,

    I am new to the IT field(been a year since I’ve been programming in Java) so I have a good knowledge on Java and basic knowledge of DBMS.
    I, like any other IT professional, am curious to kick start my career and develop a niche skill like Hadoop. Is it wise to enroll for a certification at cloudera or any other renowned places? How easy would it be for my to find a good paying job with that certification on my resume and largely only that as my experience is just a year so far.

    Like

  7. Mikey July 3, 2013 / 12:11 pm

    Thanks for the write-up, I got this question though.

    The knowledge and skills that you say you gained from the training all seem related to the IT manager perspective. But would you feel confident actually developing non-trivial Hadoop applications (or just be an entry-level Hadoop developer if there is such a thing) based on this training? Even let’s say you augment the training and certification with a book or two and dabble in a home-cooked project.

    Like

    • David September 23, 2013 / 10:59 pm

      I am more on the IT manager side of things these days, so perhaps that’s the perspective that came across. I think the real training comes from “on the job” learning. So, if you have solid java or python skills, then I think you can pick it up quickly. The course introduces map reduce concepts, so that is helpful if you don’t think in map reduce (I think that describes most people).

      There’s a “which comes first, chicken or the egg” problem here. Employers are looking for people w/ experience. You get “real” experience in a job that needs these skills. (not necessarily a home-cooked project, though this is good to show true interest in the topic –i.e., took the time to do something at home as opposed to just talk about it).

      My main takeaway from Cloudera training was that for most analysis (for text based datasets) , everything can be done in Hive, which is basically a SQL like language, so no coding needed. I think anyone w/ solid SQL experience (i.e., excellent query building skills (coding by hand) + good understanding of analytic functions) could pick up Hive very easily.

      More advanced stuff, like building a recommender system on Hadoop or other custom stuff would require either java or python, so you’re taking more along the lines of a “data scientist” type role, where it’s more about understanding complex algorithms and applying these in Hadoop.

      To answer your question, can you pick it up easily? If you’re working w/ text, use Hive. It builds the map reduce job for you using a very easy to learn query language. Or use pig if you have solid python skills. If you need to do custom map reduce jobs in java, you better know Java. There’s a steep learning curve, but it’s doable. I think the key challenge is “thinking in map reduce”, which I think the course does a decent job. (My instructor did anyway.. sounds like other people (see below) had different experience).

      Like

  8. Mani July 30, 2013 / 10:56 am

    Hi David,

    I am in the dilemma whether to invest on Cloudera Developers training by investing such a huge amount.I am a Java/J2EE developer having close to 8 years of work experience.Will attedning this course and getting necessary knowloedge on developers perspective help me in getting a job or actual work expereince is needed to get a Hadoop job in the market?Kindly share your thoughts on this!

    Regards,
    Mani

    Like

    • David September 23, 2013 / 11:10 pm

      I would hesitate if you are paying for it yourself. There are plenty of resources online. I think working on a small project at home and learning basic concepts is the way to go. I think the certification plus working on a project is a good thing to show potential employers. Only taking the certification and not working on a personal project would not be useful.

      Like

  9. Aks K August 2, 2013 / 11:35 am

    I have attended Cloudera ‘Devloper’ training which was for 4 days. Here is my review about the training.

    They had good material to go through, but unfortunately our trainer was a newbee.
    He could not answer more than 5% of the questions and could not debug a single programming related issue. I am 100% sure that he has not written any mapreduce jobs.
    There was a lady in our group who was hired as a Consultant by Cloudera and she said she going to provide the same training to other students in near future.

    I called Cloudera and shared my learning expereience. The instructor could not explain any concept in even one layer of detail, forget about going deep into it. The person I spoke with was surprised and then offered me to attend the similar training again for no cost.
    I know none of the students were happy, and one guy emailed me that he has also got the similar offere from Cloudera.
    I can retake the course, but then I will lose 4 days of my pay as my company is not going to give me the day off for the same training.

    Why I am telling you this so you know that the training you are going to attend is going to be a waste of your time and money. Even if it is your company who pays for it, you will not be able to show the results to them.
    In a way, Cloudera admitted their training quality when they told me that “if I decide to take the upcoming training, then they are going to change the trainer”. That shows how much they trust their trainer.
    On a side note, all the best to students who are going to attend “Cloudera Developer Training for Apache Hadoop” on Aug 19 – Aug 22 in Chicago area.

    If you just want to take a break from your work, then go for it and I assure you that you will get what you want.
    but if you really want to learn Hadoop, then better look somewhere else.

    Like

  10. Francis September 20, 2013 / 6:03 am

    Hi David,
    I really need some help from you. I am from India. I am offered with a job in Big Analytics field in 6 months. I have only meagre experience in programming. I know only C C++ and little bit of Objective. For getting this job, I am asked to learn Java, Python and Mysql within 3 months. After that, I have to learn Hadoop Hive and related stuffs in next 3 months and I have to get the Cloudera Certification.
    My question is; with only a little programming experience, will I be able to achieve the certification. How easy/hard will it be for beginner like me??

    Like

  11. David September 23, 2013 / 10:30 pm

    Wow Aks, sorry to hear it was not worth it to you.

    It’s been a year and a half since I when I took the class. My instructor was knowledgeable, sad to hear things are different now. People’s mileage will vary… hopefully your experience is just a one-off and not indicative of the overall program nowadays.

    I keep getting requests for training materials, etc. I cannot share them with people, and besides, it’s been 1.5 years now, so it’s probably out dated anyway. For everyone asking if it’s worth it… as I mentioned in my post, certification won’t necessarily get you the job. If you don’t know Java, or python, or some other Hadoop-friendly language, then you may want to learn that first before you take the course or even look for the job.

    Like

  12. Vijay October 11, 2013 / 12:40 pm

    Hi All,

    If you really dont want to spend too much money on hadoop learning. Then i suggest you try http://www.hadoopexam.com trainings, their training is very well explained like whiteboard sessions. Even they have hands on sessions. I cleared both developer ccd 410 and cca 410 with http://www.hadoopexam.com and by reading hadoop definitive guide. Thanks to trainer by explaining things in very easy language. I wish all the best to hadoopexam.com

    Like

  13. Neena March 13, 2014 / 11:36 am

    Hi

    I would like to know if one may take the exams without formal training? I was just going try to youtube it. Though I agree it may be difficult to ste aside time and true concentration that way. I am trying to get certified in so many languages but unsure of the value of it due to my true lack of experience more so in the job market. Thanks

    Like

  14. Hadoop online training September 20, 2014 / 7:26 am

    Hi,
    Thanks for providing useful information we has been provides hadoop online training with all modules through real time faculty.

    Like

  15. narayan October 16, 2014 / 1:32 am

    Hi David,
    I am having 8yrs of experience in Storage testing(NAS and SAN) , perl and unix and good understanding in Linux. Now I am learning python to enter in to Bigdata domain.
    So Python is sufficient to enter in the Hodoop area or its good to learn Java instead of Python.
    Can you suggest me how I can start learning in Hadoop and Analytics and which tools will be good for my skills.

    Like

Leave a comment