开源日报 每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文,坚持阅读《开源日报》,保持每日学习的好习惯。
今日推荐开源项目::《3,2,1,音乐开始 nuclear》
今日推荐英文原文:《Machine Learning — Should you be a first mover or fast follower?》
今日推荐开源项目:《3,2,1,音乐开始 nuclear》传送门:GitHub链接
推荐理由:一个搜索互联网上免费资源的免费音乐播放器。基本上普通音乐播放器有的播放队列这些功能它都能实现,除此之外它还能从 Youtube 和 bandcamp 这些地方搜寻音乐,兴许你喜欢的歌手的新专辑就能在这些地方先一步找到。它现在还处在发展当中,在之后的版本中可以支持本地音乐和推荐音乐这些功能,有兴趣的话可以关注一下。
今日推荐英文原文:《Machine Learning — Should you be a first mover or fast follower?》作者:Tejas Mahajan

Machine Learning — Should you be a first mover or fast follower?

“Innovation distinguishes between a Leader and a Follower” — Steve Jobs

Machine learning is not just glorified statistics. It’s a story which is being modelled using incremental learning algorithms with data being the core driving fuel to it. This blog post is one which will drive you through, what one shall expect from machine learning and how does being a first mover or fast follower in this domain have its positive consequences and drawbacks.

Building machine learning powered products is an art developed by having persistence to do uncountable experiments involving a thorough study of understanding how to curate datasets, doing extensive exploratory data analysis to understand which questions can be answered by your data, building features using that data, use these features to solve the business problem using learning algorithms. It’s an iterative process where results obtained at every stage go through phases of analysis to determine the efficacy of the algorithm. Machine learning is no rocket science or magic, in simple terms it’s how a person uses the data, understands it, and has the knowledge of how to leverage it to build generalizable systems.

So why does every organisation whether it be in retail, fintech or any vertical are investing their resources in data science? The answer is simple — DATA. The exponential rise in data generation has given these data-hungry algorithms something to feast upon. Another reason is the availability of affordable cheap compute resources helping research labs, startups and companies iteratively progress experiment after experiment.

So these two major reasons gave rise to a plethora of problems to solve in every vertical across various industries be it retail, supply chain, fintech, health, IT, telecommunications, automotive and many more. With this rise in problems, has given birth to many companies across various countries to solve local and global issues, which leads to two class of problem solvers who are the first movers and the fast followers.

A first mover in machine learning ecosystem is one who is first to use an untapped data source, idealise the proper use of this data, craft solutions and ultimately build scalable products answerable by that data. A first mover has the upper edge by being first to market, in some cases having patents filed on products they have built, understanding the very minute intricacies in the data they gather from day one. Shazam, a simple music recognition app, a first mover, built the most accurate music recognition engine to eliminate noise and recognize the song with incredibly low latency and till date doesn’t have any strong competition to supersede it or even match it. Snapchat is also one such app that was first to develop the destructible messages idea, live face filters, stories which are powered by the best low latency face landmark detection machine learning algorithms.

The technological leadership using the state of the art algorithm with appropriate fine-tuning helps the first movers get a firm footing in their domain. Being first to customers can be very helpful as it helps create loyalty towards your product, at the same time a feedback loop gets sets in place which helps you to incrementally upgrade your product based on user suggestions.

While being a first mover in machine learning industry also comes with its share of disadvantages. Since your product has never been conceptualized before, you are very susceptible to tap the wrong or rather invaluable data sources, leading to waste in time, resources and most importantly meaningless insights. A naive implementation of data management infrastructure can cause serious problems once your product needs to scale to millions of people with least infrastructure costs, contrary to this, setting up an infrastructure to serve millions on immediate basis can incur high bills when you don’t have the customers yet. Webvan, a failed online grocery platform, is the perfect example of a company who wasted a huge amount of their funds on infrastructure costs even when they didn’t have sizeable customers then, a perfect example of failing because of very fast growth. How will the market react to your product? is one of the very big risks first movers take and with machine learning products, there have been cases that people have built smarter products but if they are doing only as good as rule-based systems with lesser costs; people will find no motivation to use your product.

Sometimes being a first mover can also prove fatal because it is very much possible that the product is very ahead of its time. One famous example of such product is Google Glass, as sexy it was and the endless possibilities of applications that can be built around it, it failed for the current time because people don’t see it as a need yet. This now lets us think who will be that fast follower who is going to tap, as in this case, the market of google glass with the many limitless applications around it powered by machine learning algorithms.
So, fast follower huh? The very first thing that arises in your mind now is can I really build stuff that someone has spent years on? has the most brilliant minds and resources to build them? will it be worth my time to enter that market or cost-effective say if its an in-house solution than a 3rd party?. While all these are valid concerns, history suggests that fast followers have at the majority of the times come up top. Speaking strictly based to machine learning, this has been a very common phenomenon primarily because of two factors; one being the ability to use existing data points, to create new features which help in building products to solve new use cases; the other being old handcrafted feature-based statistical machine learning model being replaced by automated feature engineered deep learning models where the amount of data is sufficiently available. Fast followers can even be successful in reverse engineering a product, which some years ago felt a technological challenge, but due to the enormous amount of data, research papers and implementations being open-sourced, and compute resources being easily available it is possible. Fast followers can eventually event turn out to be market leaders even though they do not have the best fit model, but have the technical expertise to produce their nearly good model, scale it as per demand and periodically improve it. There have been cases where certain problems once solved by first mover startups are not periodically upgraded, where fast followers see an advantage and capitalize on it. Whether you are the first innovator or late entrant to the market, every startup needs a differentiator which is seen as an important requirement by your customers, that’s your USP (unique selling point). Fast followers understand the pain points of customers through their study on the first movers and capitalize on these points; in machine learning space it helps by creating cost-effective new architectures with existing data or even helps realize different by-products answerable by data which help solve critical customer issues. Apart from having AI assistants to help you with daily news and managing your IoT devices, Woebot is a different type of chatbot developed by Stanford researchers to combat the problem of depression through interactive cognitive therapy. Slack, an enterprise team collaboration tool, another fast follower, has gained lots of success by developing smart chatbots and third-party app integrations to help streamline processes like recruiting new employees, managing project deadlines, and many more.

Fast followers have their share of problems too. Considering a first mover has gained success, entering the market you must have a clear differentiator and no room for failures. Many times projects have been scrapped while reverse engineering because the increasing amount of time invested in a particular problem is not translating to a profitable product. Sometimes the problem itself is pretty difficult to decipher like the case of Shazam. Having the right dataset isn’t enough to build the next upcoming ML solution, you need to at times understand the domain to create relevant features; have the skill to craft algorithms which can update itself based on new variations in its data.

It’s not about having the largest dataset, but having a dataset capturing every variation during test time.

With all said fast followers will always be there, with the ever-increasing set of solutions that can be devised, by procuring the right data, building a scalable data infrastructure pipeline and ultimately producing intelligent learning systems. Fast followers (like cruise, drive.ai, aurora, pony.ai etc.) to google’s self driving car project turned company called waymo, are procuring investments amounting to millions of dollars, is a clear indicator of the potential and impact of machine learning technologies in the coming years.

In our current age of technology, we see machine learning algorithms to do very good in task pertaining to supervised learning but that’s just the icing on the cake, unsupervised learning, the real cake, has only scratched the surface and I personally feel it will be the one the most active area of research for the coming years. On a closing note, it doesn’t matter if your first mover, second, third or very late in the race but what counts is whose horse is running the long race and becomes the unicorn in their domain.

It always seems impossible until it’s done. — Nelson Mandela