Pre-Pandemic and Post-Pandemic Job Hunting

I have some insight into pre-pandemic and post-pandemic job hunting.

Before the pandemic, I started looking for a position because my company refused to let me work from home, even part-time. It took me six months, but I found a well-paid remote position.

The first day of my remote position was the first day of California’s lockdown (March 13th, 2020). Which was a bit ironic because most employees became remote on that day.

Pre-pandemic remote positions were a novelty; I applied to 1 or 2 remote jobs a week, and I scoured all the job boards (Indeed, LinkedIn, Glassdoor, Dice). Most companies wanted you in the office at least part of the time. Managers wanted to see you in a cubical; you might not be doing anything but seeing you put them at ease.

The problem with going into the office is that it drastically limits the number of jobs available to you. Remote work opens up the entire United States.

Fast forward to February 2022, I received an email stating my contract wouldn’t be renewed. So, starting in March 2022, I was back in the job market. This time, my experience was the opposite. I changed my status on LinkedIn to “Open to Work.” and for the next week and a half, I received 30 to 50 emails a day, 99% being remote. When I received an email asking me if I was open to relocation, I felt bad for the recruiter. Because that position would never be filled in the current job market.

At one point, I was interviewing for 5 positions, at the same time, with base salary expectations of 175k to 200k per year.

After a week and a half of searching, I accepted an offer, for a remote position, from a large company that everyone would recognize.

If companies hadn’t been forced into remoteness, we’d likely be stuck driving into the office to appease our insecure managers.

Centralize Your Data Integrity

Systems (i.e., databases) managing their data integrity sounds like common sense, and in simple scenarios, it is common sense. However, when the business rules get complex, it’s harder to validate the data in a central location.

When a system (i.e., a database) can no longer enforce the shape of the data, something else must pick up the slack. When might this happen? 

The phone number format in the US is (area code) (prefix) – (number), here’s an example: (734) 555-3212. We’ll talk about the database in this article for simplicity’s sake, but the datastore doesn’t have to be a database. 

Phone numbers in the US always have ten digits (we are ignoring the international digit). Phone numbers can come in a variety of formats:  

  • xxx.xxx.xxxx 
  • xxx-xxx-xxxx
  • (xxx) xxx-xxxx
  • (xxx) xxx.xxxx

Most databases are limited to data-types (i.e., numbers, strings, dates, etc.) and don’t support formating. Many applications opt to use the string data-type to store the phone number. However, the string data-type accepts ANY string. To ensure the phone number is valid, we need an additional layer of validation. 

In a single application connecting to a single database, data validation is typically enforced in the application.

When you’re architecture grows to two or more application sharing a database, two things can happen:

1. Each application has its own data validation:

2. There is a central service the applications call to validate the data and persist the data:

The risk of data validation in multiple places is the validations might be out of sync. A valid format for one application might not be valid in another application. In the worse case, a bad format will throw an error or, in extreme cases, crash the application.

The best case is to centralize the data validation so the format stored in the database is consistent for the entire organization. There are exceptions, of course, and I’m assuming multiple applications read and write to a shared database base.

UW Data Science Course Reading List

Part 0: Introduction

Data science articulated, data science examples, history and context, technology landscape

Part 1: Data Manipulation, at Scale

Databases and the relational algebra

Readings

MapReduce, Hadoop, relationship to databases, algorithms, extensions, language; key-value stores and NoSQL; tradeoffs of SQL and NoSQL Readings

Data cleaning, entity resolution, data integration, information extraction*(NOT COVERED IN LECTURES)Readings* / Talks

Part 2: Analytics

Topics in statistical modeling and experiment design Readings

Introduction to Machine Learning, supervised learning, decision trees/forests, simple nearest neighborReadings

Unsupervised learning: k-means, multi-dimensional scaling

Readings

Part 3: Interpreting and Communicating Results

Visualization, visual data analytics Readings (well, watchings)

Backlash: Ethics, privacy, unreliable methods, irreproducible results

Part 4: Graph Analytics

Readings

UW Data Science Course: Week One

Flavors of Data

Numerical This is some sort of quantitative measurement i.e. Heights of people, page load times, stocks prices

There are two types of Numerical Data: Discrete Data – Integer bases; oftne counts of some event

  • How many purchases did a customer make in a year?

  • How many times did I flip "heads"

    Continuous Data

  • Has an infinite number of possible values

    • How much time did it take for a user to check out
    • How much rain fell on a given day?

Categorical Qualitative data that has no inherent mathematical meaning

  • Gender, Yes/No (binary data), Race, State of residence, Product Category, Political Party, etc.
  • You can assign number to categories in order to represent them more compactly, but the numbers don’t have a mathematical meaning

Ordinal This is a mixture of numerical and categorical data

Ordinal data that has mathematical meaning

  • Example: movie ratings on a 1-5 scale
    • Ratings must be 1, 2, 3, 4, or 5
    • But these values have mathematical meanings; 1 means it’s a worse movie than a 2

Statistics 101

Mean

  • This is the average. Sum all the values and divide by the number of values.

Median

  • Sort the values, and take the value at the midpoint
  • if you have a odd number of data points the median might fall in between the two data points.
    • If you have an even number of samples take the average of the two in the middle.
  • Median is less susceptiable to the outliers than the mean.
    • Example: mean household income in the US is $72,641, but the mdeian is only $51,939 – because the mean is skewed by a handfull of billionaries
  • Median better repesents the "typical" American in this example.

Mode

  • The most common value in a data set
    • Not relvant to continuous numerical data
  • Back to our number of kids in each house example.

Standard Deviation and Variance – These concepts are all about the spread of the data (the shape)

Variance – measures how "spread-out" the data is.

  • Variance (sigma squared) is simply the average of the squared differences form the mean.
  • Example: What is the variance of the data set (1, 4, 5, 4, 8)?
    • First find the mean: (1+4+5+4+8)/5 = 4.4
    • Now find the differences from the mean: (-3.4, -0.4, 0.6, -0.4, 3.6)
    • Find the squared differences: (11.56, 0.16, 0.36, 0.16, 12.96)
    • Find the average of the squared differences:
      • sigma squared = (11.56 + 0.16 + 0.36 + 0.16 + 12.96) / 5 = 5.04

Standard Deviation is the the square root of the variance

This is usually used as a way to identify outliers. Data points that lie more than one standard deviation from the mean can be considered unusual.

You can talk about how extreme a data point is by talking about, "how many sigmas" away from the mean it is.

Population vs. Sample

  • If you’re working with a samepl of data instead of an entire data set (the entire population)…
    • The you wnat to use the sample variance instrad of the population variance
    • For N sameples, you just divide the squared variacnecs by N-1 instead of N.
    • So, in out example, we computed the population variance like this:
      • Sigma squared (11.56 + 0.16 + 0.36 + 0.16 + 12.96) / 5 = 5.04
    • But the sample cariance woulb be:
      • S2 = (11.56 + 0.16 + 0.36 + 0.16 + 12.96) / 4 = 6.3

The Why

Probability Density Functions

This is the probability of that range occurring. Its NOT the probability of a specific number occuring.

“Gives you the probability of a data point falling within some given range of a given value.”

Probability Mass Function – Discrete Data

Examples of Data Distributions

Uniform Distribution – there is a flat constant probability that it will happen. Basically, an equal chance that it will happen. Means there is a flat constant (equal) probability of the data occurring.

Normal / Gaussian

Exponential PDF / “Power Law” – Things fall off in an exponential manner.