In case you’ve missed them, here’s Part I and Part II.
Finally, a post where I actually talk about Excel and some practical applications to let a database take over your workflow.
I’ve only mentioned Excel in passing for the previous posts, explaining that with the tools that I prepped you with, you can begin to do good things. The prerequisites to get to this advanced environment were three-fold: learn basic Python, SQL, and install the right applications/modules.
If you’re working in Excel, you’re just used to the environment. Installing or learning this stuff still may not make sense to you when you’re all said and done, so it’s appropriate for me to finish this series with a piece telling you exactly what is being replaced, program to program, and how.
CSV Files
Let’s talk about CSV files for a second. If you’ve ever looked inside of one, it is literally just a bunch of text separated by commas. A program like Apple Numbers or Microsoft Excel can go in and parse it to a spreadsheet-like view. It can also save any project back into this format. It’s magical.
The problem with CSV files is that when you go and start to edit them in Excel to add highlights or other kinds of formatting, it will get lost if you try to save it back into the format. It’s not made to be fancy, just to hold information.
Sequel Pro can handle CSV files as well, and this is likely true for other database applications. What Sequel Pro does first is inspect the CSV file and its data to determine how the database should be structured to take it on. Then it forms an INSERT statement for every line (row) of the CSV file so that it can be pushed right in. What this means is that you can take a spreadsheet-like document and give it its own database table. Then, you can manipulate it and export it back to a CSV file for someone who wants to work with it in Excel.
You’re probably thinking at this point that this is not worth the hassle of changing around files, but you would be wrong. There are certain tasks that can be done way faster using a database than in Excel. Spending ten minutes deleting specific rows in an Excel document would take about 15 seconds with the proper query – and they’re really not that difficult to write!
A Step Away From CSV
Consider this: if Sequel Pro is only writing INSERT statements with your CSV file to push data into the database, then what’s to stop you from doing something similar in Python? Nothing! That’s the point I was trying to make, is that if you can code a system look at a website or an API, you can push information back and forth between the web and the database. You don’t need to copy/paste from internet sources in Excel.
If it’s a few one-off documents that you’ll probably never see again, yes, it may still be good to use Excel as a go-between. But what if you had a routine every day to check a stock price or specific data on a website? You’d probably want to code something for the long term.
Excel-to-SQL
De-Duping
It’s too easy to drag and drop information into spreadsheets, which means that you have the potential to handle a lot more (and inconsistent) information.
Repeat information is a hassle. You have to go back through your entire document and check it many times to find these redundancies. In MySQL, it’s very easy to de-dupe rows:
SELECT DISTINCT * FROM table;
Removing Blank Rows
One of the biggest annoyances using Microsoft Excel is having to deal with rows where a cell is blank. Sometimes, you may not want a row at all because of missing information, so there has to be a way to go through and remove then when they show up. I’ve often wondered why people would go the extra step in Microsoft Excel to do this manually; there’s also no real reason to write VBA to inspect each cell and determine if the row needs deletion, which I have seen tutorials for. All you have to do in MySQL is the following:
SELECT * FROM table WHERE column IS NOT NULL OR column <> ''
This all really depends on how you’ve set up your table, but in case a blank cell can be accounted for having an empty string or by being NULL number-wise, then the row will be discarded when the query is returned. The great thing about the WHERE clause is that you can repeat this process for as many columns that you want to specify. If you’re adding more onto this, you just have to make sure that every statement after WHERE like the above is in its own parenthesis, separated by ANDs.
Advanced Filtering
In Excel, people tend to use the advanced filtering option, usually because they want to sort something. It’s easy to do, sure, but can be done with a simple query in SQL:
SELECT * FROM table ORDER BY column [ASC | DESC];
This query will select every column and every row from a table and order it (ascending or descending) by a column you specify. You don’t need to signify ASC at all, as it’s on by default, but it’s there just in case.
You can sort by as many rows as exists in your tables. Say you want to sort people who work for companies.: you may want to sort by Company – Last – First.
Counting the Rows
In Excel, lots of functions are available for you to manipulate data. The same can be applied to queries in MySQL.
Before we even get to that, people often want to know how many rows are in their table. They often just scroll to the bottom of the document to see this. In MySQL. the “COUNT” aggregate function can accomplish this task:
SELECT COUNT(*) FROM table;
This returns the number of rows in your table, minus the header row.
Group Counting
Say you had a table consisting of three columns: years, products, and their prices. First of all, you probably wouldn’t have one table to represent all of this data — you’d split some of this stuff up. But that doesn’t matter, it’s just how you got the data from someone who was doing their own thing. You can work with it.
What if you wanted to get the number of products sold for every year? You’d still use the same COUNT() aggregate function as shown above, but you’d do something a bit different.
SELECT Year, COUNT(Product) AS 'Product Count' FROM table GROUP BY Year;
What would be returned from this query would be something that looks the following:
Year | Product Count |
1985 | 40 |
1986 | 50 |
GROUP BY specifies by what category you want to pool the data together.
Other Functions
Excel: =SUM()
MySQL: SELECT SUM(x) FROM table GROUP BY y;
Excel: =AVERAGE()
MySQL: SELECT AVERAGE(x) FROM table GROUP BY y;
Excel: =VLOOKUP()
MySQL: Conceptually, the way to replace the VLOOKUP through MySQL is by one of the many JOIN statements you can choose to implement between tables.
Excel: =IF()
MySQL: Depending on what you’re trying to do, using WHERE in a query can replace Excel’s IF function.
Python: An if/elif/else statement will do the trick as well.
Excel: =Cell1&" "&Cell2
MySQL: Concat(x," ",y) AS 'New Cell' FROM table;
This combines two cells to make one, such as if you had a first and last name. Functions exist in MySQL to replace =Left(), =Right(), and =Find() as well — you can do lots of string manipulation.
Any custom functions that you have built through VBA can be re-coded in Python, for the better.