Thoughts on Attack on Titan anime series that may have been missed (but not by ChatGPT)

Season 4 of Attack on Titan raises a number of philosophical and moral questions that are central to the show’s themes and plot. The series tackles complex and thought-provoking issues that relate to human nature, morality, ethics, and the nature of war. Here are some of the key philosophical and moral questions posed by season 4 of Attack on Titan:

The nature of humanity: The series explores the nature of humanity, and the concept of what it means to be human. This question is central to the conflict between the Eldian people, who are considered to be ‘monsters’ by the rest of the world, and the Marleyans, who view themselves as the superior race. This raises questions about the nature of humanity, the value of life, and the meaning of identity.

The ethics of war: Attack on Titan raises important questions about the ethics of war and the use of violence as a means of conflict resolution. The show depicts the brutal and devastating consequences of war, and raises questions about the morality of using violence to achieve one’s goals. It also raises questions about the relationship between soldiers and their commanders, and the responsibility of those in power for the actions of their subordinates.

The concept of freedom: The series explores the concept of freedom and the idea of what it means to be free. This is shown in the relationship between the Eldians and their Marleyan oppressors, as well as in the various characters’ attempts to escape their oppressive circumstances. The show raises questions about the value of freedom, the limits of personal autonomy, and the extent to which individuals are free to make their own choices.

The role of morality: Attack on Titan raises questions about the role of morality in society, and the extent to which individuals should be held responsible for their actions. The show depicts characters who struggle with their own moral values and the consequences of their choices. It raises questions about the relationship between personal morality and the larger moral framework of society, and the extent to which individuals are responsible for their own actions.

In conclusion, season 4 of Attack on Titan raises a number of philosophical and moral questions that are central to the show’s themes and plot. The series explores complex issues that are relevant to human nature, morality, ethics, and the nature of war. These questions are often framed in terms of the characters’ struggles with their own values and the consequences of their choices, and are inspired by various philosophical schools of thought, including existentialism, determinism, and Utilitarianism.

P.S. The symbols in Attack on Titan are often seen as having real-world analogues and relevance to human history.

For example, the airships can be seen as analogous to modern military and government technology used to exert control over populations, such as drones or surveillance cameras.

The Eldian armbands and internment camps are similar to the identification badges and concentration camps used during the Holocaust, which were used to discriminate against and persecute Jews and other minority groups in Europe.

Additionally, the internment camps can also be seen as an analogue to the forced labor and prison camps used in various countries throughout history, such as the gulags in the Soviet Union or the forced labor camps in North Korea.

These symbols are meant to evoke real-world events and systems of oppression, and help to draw attention to the ongoing struggles for freedom, justice, and equality in our own world.

NOTE: this post auto-generated by ChatGPT. It’s pretty scary/awesome how well written the outputs are from this AI. How are colleges and schools going to adapt once this technology goes mainstream?

ChatGPT is going to seriously impact the white collar job market!

The widespread adoption of ChatGPT and other advanced language models will likely have significant implications for the knowledge worker and the job market. On one hand, these technologies have the potential to increase efficiency and productivity by automating various routine tasks and enabling humans to focus on higher-level, creative work.

For example, ChatGPT could be used to automate customer service tasks, freeing up human customer service representatives to handle more complex issues. Similarly, it could be utilized in legal and financial services to automate the generation of contracts, reports, and other document-intensive tasks.

However, the widespread adoption of these technologies could also lead to job loss and economic disruption. As ChatGPT and similar systems become more advanced and capable of handling a wider range of tasks, some jobs that were once performed by humans may become obsolete. For example, some low-skilled jobs in customer service, data entry, and document preparation may be at risk of automation.

On the other hand, the widespread adoption of these technologies will likely lead to the creation of new jobs and industries that did not exist before. For example, there will be a growing demand for individuals with the skills to develop, maintain, and improve these systems, as well as those who can integrate them into existing workflows.

Additionally, the impact of these technologies will be felt differently across different industries and regions. For example, some regions and countries with a heavy reliance on low-skilled labor may experience significant economic disruption, while others with a strong technology sector may benefit from the increased demand for tech-related jobs.

In the future, it will be important for individuals and society as a whole to adapt to these changes and ensure that the benefits of these technologies are distributed fairly. This may require a combination of education and training programs, government policies, and investment in technology infrastructure to ensure that everyone has access to the skills and resources needed to participate in the new economy.

In conclusion, the widespread adoption of ChatGPT and other advanced language models will have far-reaching implications for the job market and society. While these technologies have the potential to increase efficiency and productivity, they also carry the risk of economic disruption and job loss. It will be important for individuals and society to adapt to these changes and ensure that the benefits of these technologies are distributed fairly.

Note: thanks ChatGPT for these insights

A more technical explanation of ChatGPT

ChatGPT is a state-of-the-art language model developed by OpenAI. It is built on top of the transformer architecture and trained on a massive amount of text data from various sources such as books, websites, and forums. The model utilizes a deep neural network with multiple layers to generate text that is coherent and contextually relevant to the input prompt.

ChatGPT is capable of performing various natural language processing tasks, including text generation, text classification, and question answering. The model uses an attention mechanism to weight the importance of the input tokens, allowing it to focus on the most relevant information when generating text.

ChatGPT is fine-tuned using the maximum likelihood estimation method, which maximizes the probability of generating the target text given the input prompt. During inference, the model is fed an input sequence and generates an output sequence word by word.

Note: yes, this post was also auto-generated by ChatGPT

Fishy Fun with Doc2Vec

Using a fishkeeping forum corpus with everyone’s favorite vector representation

I wanted to play around with word2vec but did not want to use the typical data sets (IMDB, etc.). So, I said, what if I were to do some web scraping of one of my favorite fishkeeping forums and attempt to apply word2vec to find “experts” within the forum. Well, turns out this is a much longer journey than I originally thought it would be, but an interesting one nonetheless.

This is a first blog post of hopefully several of my adventures with word2vec/doc2vec. I have a few ideas on how to leverage this corpus using deep learning to auto-generate text, so stay tuned, and if interested, drop me a line or leave a comment!


So word2vec was originally developed by Google researchers and many people have discussed the algorithm. Word2vec provides a vector representation of a sequence of words using a not-deep neural network. Doc2vec adds additional information (namely context, or paragraph context) to the word embeddings. The original paper on Paragraph Vector can be found at A quick literature search revealed I wanted to use doc2vec instead of word2vec for my particular use case since I wanted to compare user posts (essentially multiple paragaphs) instead of just words.

Later, I found this very informative online video from PyData Berlin 2017 where another data scientist used doc2vec to analyze comments on news websites. I thought that was cool, and further fueled my interest to tinker with this algorithm in my spare time… fast forward a few hours, and its almost daylight and I’m still here typing away…

I highly recommend watching this video for additional context:   

What I’m trying to do

I’d like to do the following:

  • analyze user posts on to identify who are the “experts” on fishkeeping and plants/aquascaping
  • have fun with doc2vec while doing this

Continue reading

Map Reduce is dead, long live Spark!

Map Reduce is dead, long live Spark!

That’s the impression I, and I think most people attending the conference, walked away with after Strata NY 2014.  Most of the interesting presentations were centered on Spark.  Only corporate IT presentations about “in progress hadoop implementations” were about Map Reduce.

So who’s working on Spark?  Cool startups and vendors (preparing for enterprise IT departments to move on to Spark in a year or two).

Who’s working on Map Reduce? Corporate IT departments migrating off legacy BI systems onto the promised land of Hadoop (dream come true, or nightmare around the corner, not sure which one it will be for people).

It makes sense. Map Reduce has been tested and is ‘safe’ now for enterprise IT teams to start deploying it into production systems.  Spark is still very new and untested.  Too risky for a Fortune 500 to dive into replacing legacy systems with a still-in-diapers open source software “solution.”  Nonetheless, I am sure every technical worker will be drooling to “prototype” or create proof of concepts with Spark after this conference.

Reflections on Strata NYC 2014

I had a chance to attend Strata in New York back in October.  I had been wanting to attend Strata for a few years, but had not had a chance until now.  A few impressions: (in the form of brief bullets)

  • It’s huge! (Over 3,000 attendees)
  • Very corporate!  (A bit too corporate, too stuffy, seemed like legal departments censored some presentations)
  • All the cool kids are using/learning Spark (and Scala)
  • Map Reduce is old news.
  • Enterprises move slow like dinosaurs, are just figuring out what Map Reduce is
  • Way too many vendors
  • Not enough interesting/inspiring presentations

Those were just my impressions, others may have other opinions.

Pentaho and Vertica as Business Intelligence / Data Warehousing solution


I recently wrapped up a BI/Data Warehouse implementation project where I was responsible for helping a rapidly expanding international e-commerce company replace their aging BI reporting tool with a new, more flexible solution. The old BI reporting tool was based on a “in-memory” reporting engine, was more of a “departmental solution” than an enterprise-grade one, and was not optimally designed. For example, users found themselves downloading data from different canned reports to Excel where they ran VLOOKUPs and pivot tables to compute simple metrics such as average order value and average unit retail. Needless to say despite best of intentions, there had been a communication gap between business users and IT developers on reporting requirements during the implementation of the original BI tool.

In designing and implementing the new solutions, I set the following strategic tenants / guiding principles:

  • leverage commercial off-the-shelf (COTS) software; minimize customization and emphasize configuration instead (i.e., chose to buy instead of build, and made sure to not to build too much after buying)
  • involve all stakeholders and business users throughout process
  • enable business users to use self-service BI tools as much as possible
  • train as needed; up-skilling user base on self-service tools is better than hiring army of BI analysts
  • leverage data warehouse for both internal and external reporting
  • minimize amount of aggregation in Data Warehouse (we did almost no aggregation)
  • maximize the processing power of the ROLAP engine by pairing it with a high-performance analytical database (i.e., columnar MPP database)
  • stick to Kimball data warehouse design approach as much as possible, but be pragmatic where needed; Star Schema, Star Schema, Star Schema! (no snowflakes here)
  • take an iterative approach where possible – need to “ship” on time – understand that 1st release will not be “perfect” but does need to meet business requirements
  • for external reporting, provide canned reports only initially; test user adoption and work with clients to understand and address reporting needs over time

We looked at traditional players, open source, emerging technologies, and Cloud BI SaaS providers. I made sure business and IT stakeholders were part of the vendor selection process, ensuring they attended demos and vendor presentations. In the end, Pentaho best matched all our needs, providing us with both a solid ETL and BI reporting engines. Since we looking at providing both internal and external reporting with this solution, traditional BI vendors were prohibitively expensive, and “cloud offerings” were not compatible with our current IT capabilities and architecture (our data was not in the cloud).

Solution Description – Vertica + Pentaho BI/PDI

I proposed and received approval from our senior management and company board of directors to use Pentaho and Vertica as our Business Intelligence (BI) / Data Warehouse (DW) solution.


HP Vertica is a columnar MPP database that is 20-100 times faster than Oracle. HP Vertica is available in a Community Edition; allowing organizations to use all the features of the database for free for data up to 1TB on three nodes. You can also install the database on a single node, though for a true proof of concept, you should get at least 3 nodes. We started using Vertica 6.1 Community Edition for proof of concept (POC) and then later upgraded to an enterprise license when we went live in production.


Pentaho is an open source BI platform and ETL tool. I liked the fact that it was open source; allowing us to highly customize the BI implementation if we chose to, as well as develop our own ETL connectors and routines. Some of the client tools are a bit quirky, but I do not what BI/ETL software isn’t, given the complexity of these tools. Overall the product is solid and delivers as expected. We got the enterprise edition for the additional features and product support from Pentaho. One thing that is annoying, is all the configuration files that are spread all over the place. To be fair, this is probably more of a Java application configuration issue, than a Pentaho issue.

When I tell people that I’m using Pentaho, they are usually surprised; then I find out they were using Pentaho 3.x and then I’m not surprised by their reaction. Pentaho 4.x is a big step up from previous major releases, and Pentaho 5.0 is looking really good (I like their UI redesign). I encourage anyone who looked at an early version of Pentaho to take another look. The product has matured and is worth another look.

When I was selecting a BI vendor, the thought “no one ever got fired for choosing IBM (Cognos)” crossed my mind. I could have gone the “safe” route and used one of these other tools. However, I believe the combination of Vertica + Pentaho has delivered more value to the organization in a shorter amount of time that it would have been for us to realize with these other vendors. For our organization, for our business needs, and for the realities of our IT capabilities at the time, Pentaho + Vertica was the way to go. We delivered the project on time and within budget (and without astronomical first-year costs). We have 100% user adoption internally, and are getting very positive feedback from our merchant clients.


  • Recognized by CEO for on-time, on-budget implementation; received “A” grade on end-of-year Enterprise-wide Strategic Initiatives Scorecard
  • Excellent user adoption
  • Positive feedback from external clients
  • Reduced manual reporting tasks over 50% (and over 80% in certain departments)

Reflections on Big Data Roundtable hosted by JNK Securities

Last week I had the opportunity to attend a Big Data round table discussion organized by JNK Securities, a broker-dealer based in NY w/ offices in DC. The attendees seemed to be evenly split between technologists/practitioners and finance professionals hoping to get a pulse on market trends. The conversation was moderated by Atul Chhabra, entrepreneur and formerly Director of Cloud Strategy at Verizon.

The finance professionals were eager to understand how Hadoop, NoSQL, and other Big Data technologies were going to disrupt (or not) existing technology vendors. One person had asked how easy it would be for existing companies to replace their Oracle installs with Hadoop, or NoSQL database, if Oracle licensing agreements were structured to penalize such migration. As was quickly pointed out by the crowd, it is not “termination fees” that are the problem in moving away from Oracle, but the level of investment (i.e., cost) that would be necessary to refactor existing code and applications to ensure the application would function as expected. One way RDBMS vendors increase their product’s “stickiness” and cost of migration is to promote their database’s proprietary language (PL/SQL for Oracle, TSQL for Microsoft) over ANSI standards. If an application relies heavily on these stored procedures, it will have to be rewritten in the new database’s language (or in standard ANSI SQL to make it more easily transferable in the future). Of course, that’s assuming there are no hidden “gotchas” in the code itself, such as a programmer making a direct JDBC call to a database and hard coding the SQL in the web application code itself. Bottom line, it would be very expensive to rewrite existing code; and very hard to justify doing so since by itself it does not add any additional value to the company. Additionally, as Atul pointed out, migrating off Oracle may be unlikely to reduce licensing costs for enterprises since these licensing contracts are typically based on number of employees, or clients – migrating one application off Oracle would not affect number of employees, so the licensing costs remain the same, and in fact increase if there are licensing costs for the new technology (there usually is). What is more likely is for companies to build new tools, new applications using emerging technologies and leave legacy systems as is.

An interesting idea put out by one of the attendees was that they way we think about coding and building applications will dramatically change now that we are in the age of Big Data and Big Compute. There is a fundamental shift in thinking how we design applications – instead of coding for the limits of hardware assume “best case scenarios” of unbounded scalability, unending amounts of storage and RAM thanks to developments in Big Data architecture, horizontal scaling, and massively parallel processing (MPP). For example, no longer code applications and file systems to purposely delay processing while waiting for hard drives to spin up or to perform file seek operations; instead assume instantaneous read/write thanks to SSDs, assume infinite storage (through HDFS-like architecture), and assume unbounded parallelism (i.e., no longer bounded by number of cores on one particular server)

Overall, it was a great event, good dinner conversation with smart people.  Looking forward to future events.

Reflections on "Hadoop Certification – is it worth it" 18 months later

It has been over a year and half since I took the Cloudera Hadoop Developer Certification course and exam and posted my initial impressions of it on my blog. I have received more comments than I had expected, thank you for reading and sending me comments! There have been a few trends in the comments, some displayed, others kept private. The main ones are:

  1. People really want to get their hands on the Cloudera training materials
  2. People are very eager to get Hadoop jobs
  3. People are trying to transition into Hadoop from different (technical) backgrounds
  4. People want to know if they need to know Java to work with Hadoop
  5. People really want to know if getting a certification in Hadoop will land them a job.

Here is an update to each of these trends:

#1) I cannot share the Cloudera training materials with you, sorry. I wish you the best, but I cannot distribute these materials. They are also pretty old at this point, chances are some of the content is outdated by now. It seems like many of the people asking me for the training materials haven’t picked up any books on the subject at all.  So, please check out the available online resources or pick up some books (Hadoop, the Definitive Guide, comes to mind) .

#2) There is tremendous amount of interest in learning Hadoop (and getting the training materials) in India. If it
is hard to find experienced Hadoop developers in the US right now, I imagine it must be even harder in India (for now, anyway) and there must be many, many job openings right now. I can imagine the outsourcing firms trying to staff up to meet the unmet demand in the US and elsewhere. Almost all the comments and private messages sent to me for training materials were from India. I do not know how much a training course costs in India, but there are plenty of training options, in addition to Cloudera and Hortonworks’ online offerings.

#3) Career switchers (or more accurately, technology-platform-switchers) will need to impress hiring managers with their transferable skill sets and show (not tell) their passion for technology and big data. This is true for any job applicant.

#4) Regarding Java, yes, it is good to know Java to work with Hadoop, but it is not required. You can use other languages, such as python, through the Hadoop Streaming API. To work with big data, python is good language to know anyway (lots of companies are looking with people with linux/python background), so learn python while you are at it ( If you know python you will also be able to use Pig to interact with your data. What language you will will be determined by the solution architecture and design. If the company you want to work with has designed a solution with custom coded java map reduce jobs, then you would need to know java. Other places may implement Hadoop Streaming API and use python, so it may be possible to get a job there if you know python.

#5) Having a certification in Hadoop won’t guarantee you a job. Most companies are looking for experienced Hadoop hires, which is hard to do unless they are poaching employees from other Big Data statups or tech firms (Yahoo, Google, etc.). When I interviewed technical job applicants, I was surprised (perhaps I shouldn’t have been) how poorly they interview. So please, please practice your behavioral interviewing skills (“tell me about yourself”, “walk me through your resume”, “tell me about a time you had to solve a difficult problem”, “why do you want this job”, etc.). If someone has 50 certifications and can’t answer these simple questions, I will not consider them for the role. I have heard that some hiring managers consider too many certifications as a cover up for lack of skill (superstar developers don’t bother getting certified / don’t need to be certified). For the rest of us, it can help, but it doesn’t guarantee success. The Cloudera Developer course is a good overview, but for it to be meaningful, you really do need a project to work on. Working on a pet project and being able to share code samples would help set you up for success when interviewing.

As for my own personal experience, I did not get a job working directly with Hadoop following the certification course, but I also was not only considering Hadoop developer roles.  I am now leading a BI implementation project where I interviewed and hired a team of developers and analysts. We are using Pentaho and Vertica (for analytic database) and I have been evangelizing Hadoop and other technologies at my company. I find it humorous when executives say the company needs to do more “big data” or “more Hadoop” without really knowing what it means. The certification course definitely helped me speak more authoritatively about this technology at my company and when networking with others.

Whether or not to take the certification course depends on your individual circumstance. If you are dead-set on getting a job as a Hadoop developer then it may be worth it to you, but make sure to follow up with a personal project to continue learning and practicing. Many people focus on Hadoop, and seem to forget the business applications of using a technology like Hadoop (data science, improved ETL, data processing). Brushing up on those skills and domain knowledge would make you a much more interesting job candidate over all.  Good luck everyone!

Upcoming conference on node.js

Just signed up for, on April 23rd 2012. Looking forward to learning more about this event-driven framework and how to apply to business challenges.

Schedule of events includes

  • Introduction to the event-driven I/O framework that is changing that way we think about developing web applications.
  • Fully loaded Node! Lloyd Hilaiel will explain how to do a bunch of computation with Node.js, use all available CPUs, fail gracefully, and stay responsive.
  • Charlie Robbins will take us through real-world deployments in business-critical systems and why some of the world’s leading companies are choosing Node.
  • James Halliday and Daniel Shaw will show how to use Node.js to enable the real-time streaming web. Guaranteed to generate ideas for next-generation web applications.