Month: April 2021

UW Data Science Course Reading List

Part 0: Introduction

Data science articulated, data science examples, history and context, technology landscape

Part 1: Data Manipulation, at Scale

Databases and the relational algebra

Readings

MapReduce, Hadoop, relationship to databases, algorithms, extensions, language; key-value stores and NoSQL; tradeoffs of SQL and NoSQL Readings

Data cleaning, entity resolution, data integration, information extraction*(NOT COVERED IN LECTURES)Readings* / Talks

Part 2: Analytics

Topics in statistical modeling and experiment design Readings

Introduction to Machine Learning, supervised learning, decision trees/forests, simple nearest neighborReadings

Unsupervised learning: k-means, multi-dimensional scaling

Readings

Part 3: Interpreting and Communicating Results

Visualization, visual data analytics Readings (well, watchings)

Backlash: Ethics, privacy, unreliable methods, irreproducible results

Part 4: Graph Analytics

Readings

UW Data Science Course: Week One

Flavors of Data

Numerical This is some sort of quantitative measurement i.e. Heights of people, page load times, stocks prices

There are two types of Numerical Data: Discrete Data – Integer bases; oftne counts of some event

  • How many purchases did a customer make in a year?

  • How many times did I flip "heads"

    Continuous Data

  • Has an infinite number of possible values

    • How much time did it take for a user to check out
    • How much rain fell on a given day?

Categorical Qualitative data that has no inherent mathematical meaning

  • Gender, Yes/No (binary data), Race, State of residence, Product Category, Political Party, etc.
  • You can assign number to categories in order to represent them more compactly, but the numbers don’t have a mathematical meaning

Ordinal This is a mixture of numerical and categorical data

Ordinal data that has mathematical meaning

  • Example: movie ratings on a 1-5 scale
    • Ratings must be 1, 2, 3, 4, or 5
    • But these values have mathematical meanings; 1 means it’s a worse movie than a 2

Statistics 101

Mean

  • This is the average. Sum all the values and divide by the number of values.

Median

  • Sort the values, and take the value at the midpoint
  • if you have a odd number of data points the median might fall in between the two data points.
    • If you have an even number of samples take the average of the two in the middle.
  • Median is less susceptiable to the outliers than the mean.
    • Example: mean household income in the US is $72,641, but the mdeian is only $51,939 – because the mean is skewed by a handfull of billionaries
  • Median better repesents the "typical" American in this example.

Mode

  • The most common value in a data set
    • Not relvant to continuous numerical data
  • Back to our number of kids in each house example.

Standard Deviation and Variance – These concepts are all about the spread of the data (the shape)

Variance – measures how "spread-out" the data is.

  • Variance (sigma squared) is simply the average of the squared differences form the mean.
  • Example: What is the variance of the data set (1, 4, 5, 4, 8)?
    • First find the mean: (1+4+5+4+8)/5 = 4.4
    • Now find the differences from the mean: (-3.4, -0.4, 0.6, -0.4, 3.6)
    • Find the squared differences: (11.56, 0.16, 0.36, 0.16, 12.96)
    • Find the average of the squared differences:
      • sigma squared = (11.56 + 0.16 + 0.36 + 0.16 + 12.96) / 5 = 5.04

Standard Deviation is the the square root of the variance

This is usually used as a way to identify outliers. Data points that lie more than one standard deviation from the mean can be considered unusual.

You can talk about how extreme a data point is by talking about, "how many sigmas" away from the mean it is.

Population vs. Sample

  • If you’re working with a samepl of data instead of an entire data set (the entire population)…
    • The you wnat to use the sample variance instrad of the population variance
    • For N sameples, you just divide the squared variacnecs by N-1 instead of N.
    • So, in out example, we computed the population variance like this:
      • Sigma squared (11.56 + 0.16 + 0.36 + 0.16 + 12.96) / 5 = 5.04
    • But the sample cariance woulb be:
      • S2 = (11.56 + 0.16 + 0.36 + 0.16 + 12.96) / 4 = 6.3

The Why

Probability Density Functions

This is the probability of that range occurring. Its NOT the probability of a specific number occuring.

“Gives you the probability of a data point falling within some given range of a given value.”

Probability Mass Function – Discrete Data

Examples of Data Distributions

Uniform Distribution – there is a flat constant probability that it will happen. Basically, an equal chance that it will happen. Means there is a flat constant (equal) probability of the data occurring.

Normal / Gaussian

Exponential PDF / “Power Law” – Things fall off in an exponential manner.

UW Data Science Course

If you’re a DBA, you need to learn to deal with unstructured data

If you are a statistician, you need to learn to deal with data that does not fit in memory

If you are a software engineer, you need to learn statistical modeling and how to communicate results.

If you are a business analyst, you need to learn about algorithms and tradeoffs at scale.

Week 2

  1. Structures
    1. Rows and columns
    2. Nodes and edges
    3. Key value pairs
    4. A sequence of bytes
  2. Constraints
    1. All rows must have the same number so columns
    2. All values in one column must have the same type
    3. A child cannot have two parents
  3. Operations
    1. Find the value of key x
    2. Fine the rows where column “lastname” is “Jordan”
    3. Get the next N bytes

PWD – "print working directory”

Week 3


Descriptive – Just to describe a set of data (i.e. census data, ngram viewer)

  • Description and the interpretation are different steps
  • Description can usually not be generalized without additional statistical modeling

Exploratory – Find relationships you didn’t know about

  • Exploratory models are good for discovering new connections
  • They are also useful defining future studies
  • Exploratory analyses are usually not the final say
  • Exploratory analyses alone should not be used for generalizing / predicting
  • Correlation does not imply causation

Inferential – Use a relatively small sample of data to say something about a bigger population

  • Inference is commonly the goal of statistical models
  • Inference involves estimating both the quaintly you care about and your uncertainty about your estimate
  • Inference depends heavily on both the population and the sampling scheme

Predictive – To use the data on some objects to predict values for another object

  • If X predicts Y, it does not mean that X causes Y
  • Accurate prediction depends heavily on measuring the right variables.
  • Although there are better and worse prediction models, more data and a simple model works really well.
  • Prediction is very hard, especially about the future references.

Causal – To find out what happens to one variable when you make another variable change.

  • Usually randomized studies are required to identify causation
  • There are approaches to inferring causation in non-randomized studies, but they are complicated and sensitive to assumptions
  • Causal relationships are usually identified as average effects, but may not apply to every individual.
  • Causal models are usually the “gold standard” for data analysis.

Mechanistic – Understand the exact changes in variables that lead to changes in other variables for individual objects.

  • Incredibly harder to infer, except in simple situations
  • Usually modeled by deterministic set of equations (Physical/engineering science)
  • Generally the random component of the data is measurement error
  • If the equations are known but the parameters are not, they maybe interred with data analysis.

What is data – Data are values of qualitative or quantitative variables belonging to a set of items.

  • Set of Items: Sometimes called the populate, the set of objects you are interested in.
  • Variables: A measurement or characteristics of an item.
  • Qualitative: Country of origin, sex, treatment
  • Quantitative: Height, weight, blood pressure

Data rarely comes processed.

Data is the second most important thing

  • The most important think in data science is the question
  • The second most important is the data
  • Often the data will limit or enable the questions
  • But having data can’t save you if you don’t have a question

What about big data?

  • Collect much more data, much more cheaply. Lots of noise to signal ratio.

Big or small data “The data may not contain the answer. The combination of some data and an aching desire for an answer. The combination of some data an an aching desire for an answer does not ensure that a reasonable answer be extracted from a given body of data”

Experimental Design

What should I care about experimental design. It’s really easy to focus on the outcome and overlook an error with the numbers.

  • Care about the analysis plan.
  • It’s critical to pay attention to all aspects of the design and analysis of study. Pay attention to the data cleaning, to data analysis and the reporting so the key issues in the study don’t trip you up.

Question: Does changing the text on your website improve donations?

Experiment:

Formulate your question in advance

  1. Randomly show visitors one version or the other
  2. Measure ho much they donate
  3. Determine which is better

Data Science is a scientific discipline. Science demands you are answering a specific question when you are using data.

Compared two versions of the website. Randomly show visitor two versions. Measure how much they donate to figure out which is better.

Statistical inference – a key component of data science.

Confounding – What are the other variable that are causing a relationship.

  • Randomization and blocking
    • If you can and want to fix a variable
      • Website always says Obama 2012
    • If you don’t fix a variable, stratify it.
      • If you are testing sign up phrases and have two websites colors, use both phrases equally on both.
    • If you can’t fix a variable, randomize it.
    • Why does randomization help?
      • Because it eliminates the possibility that the non-random variable is a factor or not.

Both shoe size and literacy, the bigger the show the more literate someone is, but what’s happening is a baby and child have small feet and less literacy. Age is actually the factor not show size.

Correlation is not causation

Prediction Take a sample of people with Cancer. Take a set of data and separate out the folks that responded to chemotherapy on ones that did not. Then create a function, where you can determine who will and who won’t respond to chemotherapy.

Is challenging than inference

Prediction vs. Inference – the more separated the groupings

Prediction key quantities

  • Sensitivity
    • There probability that you have a disease, given that the test was positive
  • Specificity
    • The probability that you have no disease with a negative test
  • Postive Predictive Value
    • The probability that you have a positive test, that you have a disease
  • Negative Predictive Value
    • If you have a negative test, what is the probability that you have the disease
  • Accuracy
    • This is the probability that you were correct in the outcome.

Beware data dredging

Summary

Good experiments

  • Have replication
  • Measure variability
  • Generalize to the problem you care about
  • Are Transparent

Prediction is not inference

  • Both can be important

Beware of data dredging

  • Data dredging (also data fishing, data snooping, and p-hacking) is the use of data mining to uncover patterns in data that can be presented as statistically significant, without first devising a specific hypothesis as to the underlying causality.

Deep Learning With Python – François Chollet

Could a computer surprise us? Rather than programmers crafting data-processing rules by hand, could a computer automatically learn these rules by looking at data?

To control something, first you need to be able to observe it.

Lose Function The loss function takes the predictions of the network and the true target (what you wanted the network to output) and computes a distance score, capturing how well the network has done

Backpropagation Algorithm **** This adjustment is the job of the optimizer, which implements what’s called the Backpropagation algorithm: the central algorithm in deep learning.

Training Loop This is the training loop, which, repeated a sufficient number of times (typically tens of iterations over thousands of examples), yields weight values that minimize the loss function. A network with a minimal loss is one for which the outputs are as close as they can be to the targets: a trained network. Once again, it’s a simple mechanism that, once scaled, ends up looking like magic.
 Layer The core building block of neural networks is the layer, a data-processing module that you can think of as a filter for data. Some data goes in, and it comes out in a more useful form.

Representations Specifically, layers extract representations out of the data fed into them—hopefully, representations that are more meaningful for the problem at hand.

Data distillation Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation. A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters—the layers.

The Compilation Step

  1. A loss function— How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
  2. An optimizer— The mechanism through which the network will update itself based on the data it sees and its loss function.
  3. Metrics to monitor during training and testing— Here, we’ll only care about accuracy (the fraction of the images that were correctly classified).

Overfitting The test-set accuracy turns out to be 97.8%—that’s quite a bit lower than the training set accuracy. This gap between training accuracy and test accuracy is an example of overfitting: the fact that machine-learning models tend to perform worse on new data than on their training data.

Tensor Numpy arrays, also called tensors. At its core, a tensor is a container for data—almost always numerical data. So, it’s a container for numbers. You may be already familiar with matrices, which are 2D tensors: tensors are a generalization of matrices to an arbitrary number of dimensions (note that in the context of tensors, a dimension is often called an axis).

Scalars (0D tensors) A tensor that contains only one number is called a scalar (or scalar tensor, or 0-dimensional tensor, or 0D tensor). In Numpy, a float32 or float64 number is a scalar tensor (or scalar array). You can display the number of axes of a Numpy tensor via the ndim attribute; a scalar tensor has 0 axes (ndim == 0). The number of axes of a tensor is also called its rank.

Vectors (1D tensors) An array of numbers is called a vector, or 1D tensor. A 1D tensor is said to have exactly one axis.

Dimensionality Dimensionality can denote either the number of entries along a specific axis (as in the case of our 5D vector) or the number of axes in a tensor (such as a 5D tensor), which can be confusing at times. In the latter case, it’s technically more correct to talk about a tensor of rank 5 (the rank of a tensor being the number of axes), but the ambiguous notation 5D tensor is common regardless.

Matrices (2D tensors) An array of vectors is a matrix, or 2D tensor. A matrix has two axes (often referred to rows and columns). You can visually interpret a matrix as a rectangular grid of numbers.

3D tensors If you pack such matrices in a new array, you obtain a 3D tensor, which you can visually interpret as a cube of numbers.

Tenors Attributes

  • Number of axes (rank)— For instance, a 3D tensor has three axes, and a matrix has two axes. This is also called the tensor’s ndim in Python libraries such as Numpy.
  • Shape— This is a tuple of integers that describes how many dimensions the tensor has along each axis. For instance, the previous matrix example has shape (3, 5), and the 3D tensor example has shape (3, 3, 5). A vector has a shape with a single element, such as (5,), whereas a scalar has an empty shape, ().
  • Data type (usually called dtype in Python libraries)—This is the type of the data contained in the tensor; for instance, a tensor’s type could be float32, uint8, float64, and so on. On rare occasions, you may see a char tensor. Note that string tensors don’t exist in Numpy (or in most other libraries), because tensors live in preallocated, contiguous memory segments: and strings, being variable length, would preclude the use of this implementation.

Types of Data

Vector Data – 2D tensors of shape (samples, features) Bach single data point can be encoded as a vector, and thus a batch of data will be encoded as a 2D tensor (that is, an array of vectors), where the first axis is the samples axis and the second axis is the features axis.

Timeseries data or sequence data— 3D tensors of shape (samples, timesteps, features) Whenever time matters in your data (or the notion of sequence order), it makes sense to store it in a 3D tensor with an explicit time axis. Each sample can be encoded as a sequence of vectors (a 2D tensor), and thus a batch of data will be encoded as a 3D tensor.

The time axis is always the second axis (axis of index 1), by convention.

Images— 4D tensors of shape (samples, height, width, channels) or (samples, channels, height, width) Images typically have three dimensions: height, width, and color depth. Although grayscale images (like our MNIST digits) have only a single color channel and could thus be stored in 2D tensors, by convention image tensors are always 3D, with a one-dimensional color channel for grayscale images.

There are two conventions for shapes of images tensors: the channels-last convention (used by TensorFlow) and the channels-first convention (used by Theano).

5D tensors of shape (samples, frames, height, width, channels) or (samples, frames, channels, height, width)

Video data is one of the few types of real-world data for which you’ll need 5D tensors. A video can be understood as a sequence of frames, each frame being a color image. Because each frame can be stored in a 3D tensor (height, width, color_depth), a sequence of frames can be stored in a 4D tensor (frames, height, width, color_depth), and thus a batch of different videos can be stored in a 5D tensor of shape (-samples, frames, height, width, color_depth).

What I Learned Doing 250 Interviews At Google – Moishe Lettvin

Interview question

  • Favorite interview question: "Find all the words on a Boggle board"

"You could interview the same candidate twice with the same set of interviews and come to different conclusions each time."

Steve Yegge has an idea that each candidate at a perfect slate of interviews and a perfect slate of anti-slate of interviewers.

Google’s philosophy is "Missing someone who is good, is ok, compared to hiring someone who is bad." A false positive is much worse than a false negative.

Interviewing is a team effort, you might have one person that goes deep in one area, the other person doesn’t need to ask the same questions.

The hiring committee was cross teams. No one person had the power to say yay or nay.

Be prepared. Have a set of questions that you plan to ask, have an idea of your follow-up questions.

An interview is a conversation, you’re both humans, treat it as such.

Good interviewers are generous, they teach and even if the candidate isn’t the right fit.

Good questions are like onions

  • i.e. Conway’s law

Strive for higher bandwidth

  • prefer video to phone
  • prefer in-person to video

Strive for more signal to less noise, you job is to get them to show their best work.

Career Change – A Conversation with Dave Winters

Benefits and problem

  • Sales pitch., psychological.

Code world

  • know Business and know technology and communicate it.

Phase zero

  • Find my core competencies. This is the hard part. And then Sell this.

Strategic advisor

  • Be a strategic advisor, not a code monkey.

Code Monkey

  • Don’t be a code monkey

Microsoft is pivoting their business from, desktop software, to a cloud service provider. The questions I have is where does this leave the developer who has made a career using their technologies?

Learn about the Mckinsey model, specifically look at the H3 initiatives

Keys to be valuable. (Time to value) Get value out of data as fast as possible.

Do you want to be the architect, who puts the vision together or the plumber, the electrician or the carpenter?

Characterizing data, self servicing data. Vanguard data architect. Understand the business problem. Actionable business value. Help a business executive get business insights from the data.

Do I move forward as an entrepreneur or do I move in the direction of an architect/manager?

The Fifth Discipline – Peter Senge

To simplify the world, we break up ideas and tasks into smaller pieces, but then we fall into the trap that the smaller pieces are a reality, but they are small windows into the world and interact outside the boundaries of our perception.

A learning organization is a group of people who are continually enhancing their capacity to create what they want to create.

It’s important that organizations pick up the ability to learn to together on a reliable, regular and predictable schedule.

Traditional, authoritarian, hierarchical business organizations fail to tap the abilities of people. For years and years, we’ve acted as the workers have checked their brains at the door and we just wanted them to do “their work, not to think.”

The notion that people are interchangeable units is going to change. Knowledge and learning are always embodied in a person. This makes the person the organization’s most important asset.

Ultimately, it’s a change in ourselves; that will drive the change in our organization.

Systemic structures are the underlining patterns of inner dependencies.

Efficiency with Algorithms, Performance with Data Structures – Chandler Carruth

"Software is getting slower more rapidly than hardware becomes faster. -Niklaus Wirth: A Plea for Lean Software"

  • Niklaus Wirth – Algorithms + Data Structures = Programs

There are two sides to the performance coin:

Efficiency through Algorithms – How much work is required by a task.

  • Improving efficiency involves doing less work.
  • An efficient program is one that does the minimum (that we’re aware of) amount of work to accomplish a given task.

Performance through Data Structures – How quickly a program does it work.

  • Improving performance involves doing work faster
  • But there is no such thing as a "performant", not even work in the English language.
  • There is essentially no point at which a program cannot do work any faster… until you hit Bremermann’s limit…
  • What does it mean to improve the performance of software?
    • The software is going to run on a specific, real machine
    • There is some theoretical limit on how quickly it can do work
  • Try to light up all the transistors, get the most work done as fast as possible.

All Comes back to Watts

  • Every circuit not used on a processor is wasting power
  • Don’t reduce this to the absurd — it clearly doesn’t make sense to use more parts of the CPU without improving performance!

Algorithms

  • Complexity theory and analysis
  • Common across higher-level languages, etc.
  • Very well understood by most (I hope)
  • Improving algorithmic efficiency requires finding a different way of solving the problem.
  • Algorithmic complexity is a common mathematical concept that spans languages and processors. Don’t get lazy, you have to pay attention to your algorithms.
  • Example: – "it’s about doing less work", analyze the current approach and find ways to do it more efficiency
    • Initially, you might have a basic O(n^2) algorithm
    • Next, we have a Knuth-Morris-Pratt (a table to skip)
    • Finally, we have a Boyer-Moore (use the end of the needle)

Do Less Work by Not Wasting Effort, Example 1

std::vector<X> f(int n) {
 std::vector<X> result;
	for(int i = 0; i < n; ++i)
		result.push_back(X(...));
    return result;	
}

Initialize the collection size

std::vector<X> f(int n) {
 std::vector<X> result;
 result.reserve(n);
	for(int i = 0; i < n; ++i)
		result.push_back(X(...));
    return result;	
}

Do Less Work by Not Wasting Effort, Example 2

X *getX(std::string key,
        std::unordered_map<std::string, std::unique_ptr<X>> &cache) {   
    
    if(cache[key])
       return cache[key].get();
    
    cache[key] = std::make_unique<X>(...);
    return cache[key].get();    
}

Retain the reference to the cache entry

X *getX(std::string key,
        std::unordered_map<std::string, std::unique_ptr<X>> &cache) {   

	std::unique_ptr<X> &entry = cache[key];

    if(entry)
       return entry.get();

    entry = std::make_unique<X>(...);
    return entry.get();
}

Always do less work!

Design API’s to Help

Performance and Data Structures = "Discontiguous Data Structures are the root of all (performance) evil"

  • Just say "no" to linked lists

CPUs Have Hierarchical Cache System

Data Structures and Algorithms

  • They’re tightly coupled, see Wirth’s books
    • You have to keep both factors in mind to balance between them
  • Algorithms can also influence the data access pattern regardless of the data structure used.
  • Worse is better: bubble sort and cuckoo hashing

Rapid Development – Steve McConnell

2: Weak personnel. After motivation, either the individual capabilities of the team members or their relationship as a team probably has the greatest influence on productivity (Boehm 1981, Lakhanpal 1993). Hiring from the bottom of the barrel will threaten a rapid-development effort. In the case study, personnel selections were made with an eye toward who could be hired fastest instead of who would get the most work done over the life of the project. That practice gets the project off to a quick start but doesn’t set it up for rapid completion. LOCATION: 1793

Uncontrolled problem employees. Failure to deal with problem personnel also threatens development speed. This is a common problem and has been well-understood at least since Gerald Weinberg published Psychology of Computer Programming in 1971. Failure to take action to deal with a problem employee is the most common complaint that team members have about their leaders (Larson and LaFasto 1989). In Case Study: Classic Mistakes, the team knew that Chip was a bad apple, but the team lead didn’t do anything about it. The result—redoing all of Chip’s work—was predictable. LOCATION: 1798

Heroics. Some software developers place a high emphasis on project heroics (Bach 1995). But I think that they do more harm than good. In the case study, mid-level management placed a higher premium on can-do attitudes than on steady and consistent progress and meaningful progress reporting. The result was a pattern of scheduling brinkmanship in which impending schedule slips weren’t detected, acknowledged, or reported up the management chain until the last minute. A small development team and its immediate management held an entire company hostage because they wouldn’t admit that they were having trouble meeting their schedule. An emphasis on heroics encourages extreme risk taking and discourages cooperation among the many stakeholders in the software-development process. LOCATION: 1808

Some managers encourage heroic behavior when they focus too strongly on can-do attitudes. By elevating can-do attitudes above accurate-and-sometimes-gloomy status reporting, such project managers undercut their ability to take corrective action. They don’t even know they need to take corrective action until the damage has been done. As Tom DeMarco says, can-do attitudes escalate minor setbacks into true disasters (DeMarco 1995). LOCATION: 1820

Unrealistic expectations. One of the most common causes of friction between developers and their customers or managers is unrealistic expectations. In Case Study: Classic Mistakes, Bill had no sound reason to think that the Giga-Quote program could be developed in 6 months, but that’s when the company’s executive committee wanted it done. Mike’s inability to correct that unrealistic expectation was a major source of problems. LOCATION: 1851

11: Lack of user input. The Standish Group survey found that the number one reason that IS projects succeed is because of user involvement (Standish Group 1994). Projects without early end-user involvement risk misunderstanding the projects’ requirements and are vulnerable to time-consuming feature creep later in the project. LOCATION: 1873

13: Wishful thinking. I am amazed at how many problems in software development boil down to wishful thinking. How many times have you heard statements like these from different people: "None of the team members really believed that they could complete the project according to the schedule they were given, but they thought that maybe if everyone worked hard, and nothing went wrong, and they got a few lucky breaks, they just might be able to pull it off." "Our team hasn’t done very much work to coordinate the interfaces among the different parts of the product, but we’ve all LOCATION: 1887

20: Shortchanged upstream activities. Projects that are in a hurry try to cut out nonessential activities, and since requirements analysis, architecture, and design don’t directly produce code, they are easy targets. On one disastrous project that I took over, I asked to see the design. The team lead told me, "We didn’t have time to do a design." LOCATION: 1949

The results of this mistake—also known as "jumping into coding"—are all too predictable. In the case study, a design hack in the bar-chart report was substituted for quality design work. Before the product could be released, the hack work had to be thrown out and the higher-quality work had to be done anyway. Projects that skimp on upstream activities typically have to do the same work downstream at anywhere from 10 to 100 times the cost of doing it properly in the first place (Fagan 1976; Boehm and Papaccio 1988). If you can’t find the 5 hours to do the job right the first time, where are you going to find the 50 hours to do it right later? LOCATION: 1956

22: Shortchanged quality assurance. Projects that are in a hurry often cut corners by eliminating design and code reviews, eliminating test planning, and performing only perfunctory testing. In the case study, design reviews and code reviews were given short shrift in order to achieve a perceived schedule advantage. As it turned out, when the project reached its feature-complete milestone it was still too buggy to release for 5 more months. This result is typical. Shortcutting 1 day of QA activity early in the project is likely to cost you from 3 to 10 days of activity downstream (Jones 1994). This shortcut undermines development speed. LOCATION: 1969

24: Premature or overly frequent convergence. Shortly before a product is scheduled to be released, there is a push to prepare the product for release—improve the product’s performance, print final documentation, incorporate final help-system hooks, polish the installation program, stub out functionality that’s not going to be ready on time, and so on. On rush projects, there is a tendency to force convergence early. Since it’s not possible to force the product to converge when desired, some rapid-development projects try to force convergence a half dozen times or more before they finally succeed. The extra convergence attempts don’t benefit the product. They just waste time and prolong the schedule. LOCATION: 1987

26: Planning to catch up later. One kind of reestimation is responding inappropriately to a schedule slip. If you’re working on a 6-month project, and it takes you 3 months to meet your 2-month milestone, what do you do? Many projects simply plan to catch up later, but they never do. You learn more about the product as you build it, including more about what it will take to build it. That learning needs to be reflected in the reestimated schedule. LOCATION: 2001

28: Requirements gold-plating. Some projects have more requirements than they need, right from the beginning. Performance is stated as a requirement more often than it needs to be, and that can unnecessarily lengthen a software schedule. Users tend to be less interested in complex features than marketing and development are, and complex features add disproportionately to a development schedule. LOCATION: 2023

29: Feature creep. Even if you’re successful at avoiding requirements gold-plating, the average project experiences about a 25-percent change in requirements over its lifetime (Jones 1994). Such a change can produce at least a 25-percent addition to the software schedule, which can be fatal to a rapid-development project. LOCATION: 2027

What To and Not To Test – Mark Seemann

Blog Post:

https://blog.ploeh.dk/2018/11/12/what-to-test-and-not-to-test/

The Purpose of Testing

  • To exercise the Api and the design. A hard class to test, is a poorly designed interface.
  • To prevent regressions.

The Cost of Regressions

Dimensions of the Risk

  • The likelihood the event.
  • The impact of the event.

To reduce risk, you either decrease the likelihood or the impact of the event.

What is the impact of the error if happens? If it’s a dev app, there is almost zero risk. If it’s a Voyager probes. It could doom the project.