Show Sometimes, your data comes with several pieces of information in one column. Like a column with U.S. states in the format US-TX. Or a column with companies and the product they sell: Datawrapper (Software). But say you want country (US) separate from state (TX) — for example, to create a Datawrapper choropleth map. Good thing there are easy ways to separate data points into two or more columns. I’ll show two ways to create multiple new columns out of one old column. We’ll use Google Sheets — but the same tricks should work with LibreOffice Calc, Excel, or any other spreadsheet software. The first method is the formula =SPLIT(): Split columns with SPLIT()
Extract content from columns with LEFT()Sometimes you don’t have clear separator characters, but just want to extract the first or last characters of a cell. To do so, use the formulas =LEFT(B1,2), =RIGHT(B1,8), and =MID(B1,2,4):
Pro tipsPro tip 1: You can combine formulas to extract characters at all sorts of crazy positions. For example, the formula =LEN() gives back the number of characters in a cell. So =LEFT(A1,LEN(A1)-2) extracts the entire text in a cell except the last two characters. To separate the cell Datawrapper (Software) into the two cells Datawrapper and Software, you could use the formula =SPLIT(LEFT(A5,LEN(A5)-1),"(". This formula first removes the last bracket and then splits the remaining cell content on (. Pro tip 2: Now that you learned to separate text, you can also bring it together again. To combine the column US from your cell A1 and TX from B1 with a hyphen, use ampersands and write =A1&"-"&B1. Pro tip 3: You can extract content with LEFT(), RIGHT(), and MID() not just from text cells, but also from number and date cells. If you want to apply formulas like LEFT() to your dates, it helps to transform them into a text format first. To do so, use the formula =TEXT(A1, "MM/DD/YYYY"). Instead of MM/DD/YYYY, you can use any combination of these date codes and /, -, a space, etc. For example, =TEXT(A1, "dd-mmm-yyyy") will transform the date format 1st of November 2019 to a text cell with the content 01-Nov-2019. Pro tip 4: If you have empty cells in your column, and you want them stay empty after using a function like LEFT(), you’ll need to check for these empty cells first. You can do so with the function ISBLANK(), combined with an IF function: =IF(ISBLANK(A1),"",LEFT(A1,3)). I hope this was helpful! If you need more help cleaning your data to prepare it for a charting tool like Datawrapper, visit our article “How to prepare your data for analysis and charting in Excel & Google Sheets.” And if you have any questions, please leave a comment or write to me at . Latest Submission Grade: 100% You are a data analyst at a small analytics company. Your company is hosting a project kick-off meeting with a new client, Meer-Kitty Interior Design. The agenda includes reviewing their goals for the year, answering any questions, and discussing their available data. Meer-Kitty Interior Design About Us Page.pdf Meer-Kitty Interior Design Business Plan.pdf Meer-Kitty Interior Design has two goals. They want to expand their online audience, which means getting their company and brand known by as many people as possible. They also want to launch a line of high-quality indoor paint to be sold in-store and online. You decide to consider the data about indoor paint first. Kitty Survey Feedback - Meer-Kitty survey feedback.csv You are pleased to find that the available data is aligned to the business objective. However, you do some research about confidence level for this type of survey and learn that you need at least 120 unique responses for the survey results to be useful. Therefore, the dataset has two limitations: First, there are only 40 responses; second, a Meer-Kitty superfan, User 588, completed the survey 11 times. As the survey has too few responses and numerous duplicates that are skewing results, what are your options? Select all that apply.
Question 2During the meeting, you also learn that Meer-Kitty videos are hosted on their website. For each product offered, there is an accompanying video for customers to learn more. So, more views for a video suggests greater consumer interest. Your goal is to identify which videos are most popular, so Meer-Kitty knows what topics to explore in the future. Unfortunately, Meer-Kitty has just three months of data available because they only recently launched the videos on their site. Without enough data to identify long-term trends about the video subjects that people prefer, what should you do?
Question 3Now that you’ve identified some limitations with Meer-Kitty’s data, you want to communicate your concerns to stakeholders. In addition to insufficient video trend data, your main concern with the indoor paint survey is that the data isn’t representative of the population as a whole. Clearly, one particular respondent, the superfan, is overrepresented. This means the data doesn’t represent the population as a whole. When surveying people for Meer-Kitty in the future, what are some best practices you can use to address some of the issues associated with sampling bias? Select all that apply.
Question 4The stakeholders understand your concerns and agree to repeat the indoor paint survey. In a few weeks, you have a much better dataset with more than 150 responses and no duplicates. Kitty Survey Feedback - New Meer-Kitty survey feedback.csv You notice that questions 4 and 5 are dependent on the respondent’s answer to question 3. So, you need to determine how many people answered Yes to question 3, then compare that to responses to questions 4 and 5. That way, you will know if questions 4 and 5 have any nulls. You decide to use a spreadsheet tool that changes how cells appear when they contain the word Yes. Which tool do you use?
Question 5You continue cleaning the data. You use tools such as remove duplicates and COUNTIF to ensure the dataset is complete, correct, and relevant to the problem you’re trying to solve. Then, you complete the verification and reporting processes to share the details of your data-cleaning effort with your team. While reviewing, your team notes one aspect of data cleaning that would improve the dataset even more. They point out that the new survey also has a new question in Column G: “What are your favorite indoor paint colors?” This was a free-response question, so respondents typed in their answers. Some people included multiple different colors of paint. In order to determine which colors are most popular, it will be necessary to put each color in its own cell. What spreadsheet function enables you to put each of the colors in Column G into a new, separate cell?
Scenario 2, questions 6-10Question 6You’ve completed this program and are interviewing for a junior data scientist position. The job is at B.Spoke Market Research, a company that analyzes market conditions using customer surveys and other research methods. The detailed job description can be found below: C4 B.Spoke Market Research Job Description.pdf So far, you’ve had a phone interview with a recruiter and you’ve secured a second interview with the B.Spoke team. The recruiter’s email can be found below: C4 S2 Email from Recruiter.pdf You arrive 15 minutes early for your interview. Soon, you are escorted into a conference room, where you meet Jodie Choi, the data science lead. After welcoming you, the behavioral interview begins. For your first question, your interviewer wants to learn about your experience with spreadsheets. She says: Sometimes the team needs data that is stored in different spreadsheets. So, we use a spreadsheet function to find the information we need. There is a spreadsheet function that searches for a value in the first column of a given range and returns the value of a specified cell in the row in which it is found. It is called SEARCH.
Question 7Next, your interviewer wants to know more about your understanding of tools that work in both spreadsheets and SQL. She explains that the data her team receives from customer surveys sometimes has many duplicate entries. She says: Spreadsheets have a great tool for that called remove duplicates. In SQL, you can include DISTINCT to do the same thing. In which part of the SQL statement do you include DISTINCT?
Question 8Now, your interviewer explains that the data team usually works with very large amounts of customer survey data. After receiving the data, they import it into a SQL table. But sometimes, the new dataset imports incorrectly and they need to change the format. She asks: What function would you use to convert data in a SQL table from one datatype to another?
Question 9Next, your interviewer explains that one of their clients is an online retailer that needs to create product numbers for a vast inventory. Her team does this by combining the text strings for product number, manufacturing date, and color. She asks: Which SQL function would you use to add strings together to create new text strings?
Question 10For your final question, your interviewer explains that her team often comes across data with extra spaces. She asks: Which function would enable you to eliminate those extra spaces? You respond: To eliminate extra spaces for consistency, use the TRIM function.
Page 2
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. |