ChatGPT is going to seriously impact the white collar job market!

The widespread adoption of ChatGPT and other advanced language models will likely have significant implications for the knowledge worker and the job market. On one hand, these technologies have the potential to increase efficiency and productivity by automating various routine tasks and enabling humans to focus on higher-level, creative work.

For example, ChatGPT could be used to automate customer service tasks, freeing up human customer service representatives to handle more complex issues. Similarly, it could be utilized in legal and financial services to automate the generation of contracts, reports, and other document-intensive tasks.

However, the widespread adoption of these technologies could also lead to job loss and economic disruption. As ChatGPT and similar systems become more advanced and capable of handling a wider range of tasks, some jobs that were once performed by humans may become obsolete. For example, some low-skilled jobs in customer service, data entry, and document preparation may be at risk of automation.

On the other hand, the widespread adoption of these technologies will likely lead to the creation of new jobs and industries that did not exist before. For example, there will be a growing demand for individuals with the skills to develop, maintain, and improve these systems, as well as those who can integrate them into existing workflows.

Additionally, the impact of these technologies will be felt differently across different industries and regions. For example, some regions and countries with a heavy reliance on low-skilled labor may experience significant economic disruption, while others with a strong technology sector may benefit from the increased demand for tech-related jobs.

In the future, it will be important for individuals and society as a whole to adapt to these changes and ensure that the benefits of these technologies are distributed fairly. This may require a combination of education and training programs, government policies, and investment in technology infrastructure to ensure that everyone has access to the skills and resources needed to participate in the new economy.

In conclusion, the widespread adoption of ChatGPT and other advanced language models will have far-reaching implications for the job market and society. While these technologies have the potential to increase efficiency and productivity, they also carry the risk of economic disruption and job loss. It will be important for individuals and society to adapt to these changes and ensure that the benefits of these technologies are distributed fairly.

Note: thanks ChatGPT for these insights

A more technical explanation of ChatGPT

ChatGPT is a state-of-the-art language model developed by OpenAI. It is built on top of the transformer architecture and trained on a massive amount of text data from various sources such as books, websites, and forums. The model utilizes a deep neural network with multiple layers to generate text that is coherent and contextually relevant to the input prompt.

ChatGPT is capable of performing various natural language processing tasks, including text generation, text classification, and question answering. The model uses an attention mechanism to weight the importance of the input tokens, allowing it to focus on the most relevant information when generating text.

ChatGPT is fine-tuned using the maximum likelihood estimation method, which maximizes the probability of generating the target text given the input prompt. During inference, the model is fed an input sequence and generates an output sequence word by word.

Note: yes, this post was also auto-generated by ChatGPT

What is ChatGPT? How I’d explain it to a 5 year old

ChatGPT is like having a super smart friend who can answer any question you ask. This friend is made by very smart people using computers and lots of information from books and websites. They taught the computer how to have a conversation with you by using what they learned.

When you ask ChatGPT a question, it uses its big brain to find the answer and then types it back to you. ChatGPT can answer questions about anything, like animals, places, and even math problems. It can also talk with you and have a conversation just like you would with a real friend.

It’s important to be nice to ChatGPT and ask it questions that are easy to understand. Try asking simple questions and give it time to think and find the answer. ChatGPT is still learning, so sometimes it might not know the answer or might make a mistake, but that’s okay! Just ask another question and it will try its best to find the answer for you.

ChatGPT is a cool tool that can help you learn new things and have fun conversations. Just remember, it’s not a real friend and doesn’t have feelings, so treat it nicely and always use good manners when you talk to it.

NOTE: This entire post other than disclaimer was auto-generated by ChatGPT. Cool huh?

Unleash Your Data Science Potential: Embrace the Power of ChatGPT and AI!

Yo, listen up data scientists, you need to stay ahead of the game, know what’s good for your career and the future of the field. AI technology is poppin’ off, and ChatGPT is one of the leading models out there. You best believe you gotta pay close attention to it and keep up with the latest developments.

First of all, ChatGPT and AI technology is transforming the way we do things. In the near future, AI-powered models like ChatGPT will be a critical tool in solving complex data science challenges and automating tedious tasks. This is the way the industry is heading, so you wanna be ahead of the curve.

Second, being well-versed in AI technologies, like ChatGPT, can help you stand out in a competitive job market. As AI becomes more widely adopted, demand for data scientists with AI skills will skyrocket. By paying close attention to ChatGPT and emerging AI technologies, you’ll be better positioned to meet the needs of future employers.

Finally, as AI technologies continue to evolve, the field of data science and machine learning will face new and exciting challenges. By paying close attention to ChatGPT and other AI models, data scientists can stay ahead of the curve and better prepare themselves to tackle these challenges.

So, in short, if you’re a data scientist or professional, it’s critical that you stay ahead of the game and pay close attention to ChatGPT and emerging AI technologies. This will help you stay relevant, advance your career, and be better prepared for the future of data science. Keep it real, and stay ahead of the game.

NOTE: This entire post, other than this disclaimer, was generated by ChatGPT.

Building a credit model

A coworker recently asked me to explain how one goes about building a credit risk model. It’s something my company does a lot of, but apparently it’s not taught during new hire on-boarding. Also, it made me think, how would I actually explain the process end-to-end to someone interested in our industry but not a practitioner? Curious, I searched Google in case anyone had already done so, and of course someone else had! So, here’s a quite impressive deep-dive into credit risk modelling thanks to Natasha Mashanovich, Senior Data Scientist at World Programming: Credit Scoring: The Development Process from End to End

Credit Scores throughout the Customer Journey

This is a ten-part series of blog posts describing the entire process. Her company seems to be some sort of SAS competitor, and I am not endorsing her product or company in any way. That said, her write up is pretty tool-agnostic and pretty general, so it is worth a read if you are interested.

Personally, I would create a modelling pipeline in python / pyspark (since we deal with large data sets) in a cloud environment (like AWS) instead of SAS, but not everyone in the financial services industry has moved to the cloud yet. I hope you find the link to be helpful…

Fishy Fun with Doc2Vec

Using a fishkeeping forum corpus with everyone’s favorite vector representation

I wanted to play around with word2vec but did not want to use the typical data sets (IMDB, etc.). So, I said, what if I were to do some web scraping of one of my favorite fishkeeping forums and attempt to apply word2vec to find “experts” within the forum. Well, turns out this is a much longer journey than I originally thought it would be, but an interesting one nonetheless.

This is a first blog post of hopefully several of my adventures with word2vec/doc2vec. I have a few ideas on how to leverage this corpus using deep learning to auto-generate text, so stay tuned, and if interested, drop me a line or leave a comment!

Background

So word2vec was originally developed by Google researchers and many people have discussed the algorithm. Word2vec provides a vector representation of a sequence of words using a not-deep neural network. Doc2vec adds additional information (namely context, or paragraph context) to the word embeddings. The original paper on Paragraph Vector can be found at https://cs.stanford.edu/~quocle/paragraph_vector.pdf A quick literature search revealed I wanted to use doc2vec instead of word2vec for my particular use case since I wanted to compare user posts (essentially multiple paragaphs) instead of just words.

Later, I found this very informative online video from PyData Berlin 2017 where another data scientist used doc2vec to analyze comments on news websites. I thought that was cool, and further fueled my interest to tinker with this algorithm in my spare time… fast forward a few hours, and its almost daylight and I’m still here typing away…

I highly recommend watching this video for additional context:   

What I’m trying to do

I’d like to do the following:

  • analyze user posts on Fishlore.com to identify who are the “experts” on fishkeeping and plants/aquascaping
  • have fun with doc2vec while doing this

Continue reading

Computer Vision meets Fish Tank

One day I got curious… what if I programmed my computer to track the fish swimming in my fish tank? That led me to tinkering with an open source software library called OpenCV. I fiddled around with the settings, tried a few things, and saved the output as a video, seen below. There’s a lot of research in computer science around object recognition and identification … this mini-project was just an attempt to have some fun poking around with some “older” computer vision technologies. Let me know what you think!

 

 

Python API to AqAdvisor.com

Context

Approximately 10% of American households have fish as pets.
It is estimated that 95% of fish deaths can be attributed to improper housing or nutrition. Many times fish are sold or given away without any guidance to the new pet owner, such as goldfish giveaways at carnivals or at birthdays. Some fish have myths associated with them, such as the betta fish (siamese fighting fish) that supposedly can live in dirty water in small bowls.

AqAdvisor.com is a website that helps aquarists plan how to stock their fish tank. Users specify their tank size, their filtration, and what fish they intend to keep in the tank. The site will calculate the stocking level and filtration capacity given the inputs. This is a useful tool to get a rough estimate on a fish tank’s stocking level, it even lets you know whether the fish are compatible with one another, if you have more than one species in the tank. AqAdvisor is sometimes criticized for “not being accurate”, so the output generated should be not be treated as gospel; nonetheless, it gives a reasonable starting point, and is generally very useful for beginner fishkeepers.

Why I created this tool

I started using AqAdvisor and got annoyed at the archaic design. It’s not a RESTful API, it’s a clunky web site that takes a while to load. I was doing lots of research and found myself wanting a better useful experience. I also had some free time on my hands one long holiday weekend so I decided to give myself a little programming exercise of creating a python API to the site.

How to use the tool

The easiest way to use the tool is to use the ipython notebook as a starting point. First, create a stocking, then a tank, and then make a call to the AqAdvisor service. Because of the clunky web interface, multiple calls to AqAdvisor.com must be made if you want to have more than one fish species in a tank (as is would be the case for a community tank). The auto-generated AqAdvisor URL will be printed for each call out to the website. This is useful in case you want to jump over to the web UI, you can just copy and paste the URL into your web browser and continue from there.

Use the common (English) name for the fish you are looking for. PyAqAdvisor will do a “fuzzy match” to AqAdvisor’s species list and match the closet one. This way you can specify your stocking list as “cardinal tetra” and not worry about the scientic name.

Please look at examples/example.py and examples/example.ipynb for more information.

Here’s an example of how easy it use the new API:

from pyaqadvisor import Tank, Stocking

if __name__ == '__main__':

  stocking = Stocking().add('cardinal tetra', 5)\
   .add('panda cory', 6)\
   .add('lemon_tetra', 12)\
   .add('pearl gourami', 4)

  print "My user-specified stocking is: ", stocking
  print "I translate this into: ", stocking.aqadvisor_stock_list

  t = Tank('55g').add_filter("AquaClear 30").add_stocking(stocking)
  print "Aqadvisor tells me: ",
  print t.get_stocking_level()

Github Repo: PyAqAdvisor

Note

  • PyAqAdvisor currently only works for freshwater fish species. If you are interested in saltwater fish, please contact me.

Generate heart rate charts from MapMyRide TCX files

So I had some free time over Columbus Day weekend and figured why not spend it on a fun programming project. My politically-incorrectly named GhettoTCX project emerged after some quick fussing around with TCX (XML) file.

Ghetto TCX

GhettoTCX will parse a TCX file from Garmin, MapMyRide, etc. and generate some basic plots. The most interesting plot type is the heart rate zone chart. It can create a panel of plots, by parsing all the filed in a given directory.

It’s called GhettoTCX because it’s a no-frills, nothing fancy, not even a true TCX file parser. It simply searches for some keywords and pulls out heartbeat info and lat/long data. And not even at the same time, you need to the read the file twice if you want to plot both.

Heart Rate plots
Heart Rate plots

The example code and python code repository can be found on the project’s github page.

There are “better” TCX/XML file parsers out there. This one was meant to do one thing (actually two things), quickly and easily: plot heart rate (and heart rate zones). It can also plot lat/long data points onto a scatterplot, but it is seriously no-frills when you can get nice google maps charts on MapMyRide and practically any other fitness app out there.

It started out (and ended) as a fun weekend programming project… if you are curious about your heart rate zone, and are too cheap cost-conscious to pay the monthly subscription fee to MapMyRide for the heart rate zone chart, you can use this free tool instead. Enjoy!