Criticism

Years ago in a class called Nonfiction Writing, Professor James P. Degnan — a dear friend, a wonderful teacher, and the best writer I’ve ever known — marked a passage in an essay I’d written and…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Which tornado has the most cheese?

What are the current applications of Machine Learning in real life/industry? What are the global challenges that humanity faces today and how can they be overcome with the help of ML? How can ML be used for good and how can it be used for evil? What is the latest ML research? And most importantly: Which tornado has the most cheese?

Which tornado has the most cheese?

AI &
- Cities
- Computer Systems
- Environment
- Finance
- Health
- Industry
- Intellectual Property
- Language
- Learning Analytics
- Media
- Molecular World
- Networks
- Nutrition
- Society
- Transport
- Trust

As you can probably imagine, it was a bit much — all of the above happened in two days. While the big keynotes and panel discussions had separate time slots, the tracks were running in parallel with little to no time between the individual talks and also overlapping start and end times between the different tracks. All a bit messy if you — as I — are interested not only in one particular area but rather decide by the topic of the individual talk.
Nonetheless, it was an interesting conference, with three personal highlights: AI for good, AI for evil, and Data.

Currently, discrimination is often based on visual features that are easy to detect, such as gender or ethnicity. While we are not anywhere near eliminating those types of discrimination, we are at least aware of them and often able to recognize when they happen. Zeynep pointed out, that with machine learning algorithms doing inferences based on various data sources, such as social media activity, there may soon be new features that we could be discriminating by without even knowing.

Also, machine learning algorithms are far from acting non-biased as they learn from our, highly biased, data. Would the algorithm only hire certain genders, people with distinct personality traits or who belong to a certain ethnic group, because those people are among the high achievers in the company? Not realizing that this may only be a correlation based on previous human/cultural bias but not a valid causality?

Of course, we can get a certain insight in the reasoning of machine learning algorithms by doing extensive auditing but with their in-transparency, rising complexity, and amount of data that they incorporate, this is easier said than done.

Zeynep rightfully concluded that we cannot outsource our moral responsibilities to machines, but should only use them to help and guide us in making our own decisions.

from “1500x500 image with a big red A in the center on a white background” you can probably reconstruct this images fairly well — you don’t need me to tell you all 2,250,000 values that it consists of

Say you want to classify thousands of images of luggage items into different categories like hard bag, soft bag, rucksack, box and sports-bag. You could do this in a supervised manner and classify a part of the images manually, train a supervised classification algorithm and let it classify the rest.

With an Autoencoder however you would not need the manual annotation. You would instead extract important features of the images (hoping that they are different for the different classes), so that you reduce each of your say 500x500 images from 750,000 values to 1000 features (a compression of factor 750!).

With those feature-vectors (your encoded images), you can do some high-dimensional clustering (not actually that easy) to see which vectors are close to each other and thus show (visually) similar classes. Looking at samples from those clusters you could then see which cluster contains the rucksacks, which the hard bags, etc. Voilà: An unsupervised classification algorithm!

It’s obviously not as easy as I make it out to be, but an important idea nonetheless: Labelling data is tedious, ambiguous, inaccurate and we need a s**ton of it for good algorithms. Seems like we need to find a more intelligent way of doing it.

In machine learning, often the idea is that each datapoint that you have (say a picture of a bicycle), belongs to a data distribution. So if you had access to all the bicycle images that were ever taken and categorize them according to different attributes you could see that your average bicycle has two wheels (could have guessed that by the name), you could see what percentage of images were taken from a similar angle and what proportion of bicycles are green. If you had access to that distribution (which you don’t), machine learning would be rather easy.

What you do have access to are individual points of the distributions (individual pictures of bikes) from which you try to learn. The idea behind synthetic data is that you design a process that generates (random) data points that, if you let that process run for a bit, create a distribution that is similar to the real one. If you manage to do it right, you can just use the data points that you generated artificially instead of the real stuff.

This is what Adrien and his team did. Instead of using real-world video footage, they used a driving simulator: the simulator knew what it was simulating so they got the pixel-level annotations (the ground-truth) for free! But while the simulator did a good job on mimicking physics, behavior etc, the visual output was less convincing.

So in terms of gathering, curating, creating and using data there still seems to be a lot to learn and discover.

All in all a very interesting conference and worth a visit. Oh, and in case you wondered (I got it wrong):

Add a comment

Related posts:

BrewFinder User Flow

Find my beer based on location, brewery, beer type, or current deals. The app features the ability to search with a map, and profile different beers and breweries. The last feature is a questionnaire…

POV Relapse

This is not a story about a direct porn-related relapse. This is a story about how the effects of a porn addiction lead me to infidelity, the fall-out and the hope of recovering the relationship. I…

Dotcumenting Happiness

The latest dataset given to us was on happiness from the World Happiness Report 2018. For a start, I did a simple descriptive analysis on the happiest 20% and unhappiest 20% of countries who…