Data Migration File Format Standard
This is an overview of what MIE expects from data files that are received from clients for data migration. Because it is time consuming and often unreliable to find and fix any problems with these files, MIE will return to the client any files that do not meet the following requirements.
Comma separated values (CSV) files have each datum separated by a delimiter (traditionally a comma). To ensure data exports correctly:
- The delimiter can be any single character, but it must be consistent throughout the entire file.
- Fields can be enclosed using another character, usually quotation marks ("). Any data fields that contain the character used as the delimiter, or a newline must be enclosed.
- If the enclosing character is in a quoted field, then it must be escaped. This can be done with another character, such as a backslash ('') or by repeating the enclosure character (""). Some examples of this common mistake are below.
- Each row must have the same number of fields.
- The first row should be a header row, which indicates the column name for each datum in the rest of the file.
- The file should be encoded in UTF-8, or another unicode format (UTF-16 or UTF-32). UTF-8 is preferred; however, other formats are possible, if mutually agreeable.
- The name of the file should be the name of the table with the extension, if any. For example, if a table named myTable was exported, the CSV file should be called myTable.csv, or myTable.dat and myTable.txt are also acceptable.
The following sections show some common errors, with examples of how they should be addressed.
Delimiter character is not enclosed
12345,2015-03-03,This string has a comma, but it isn't enclosed,56789,more data
- The above example has a problem, because the delimiter (comma) is also used within one of the data fields (i.e., This string has a comma, but it isn’t enclosed). The field needs to be enclosed, to indicate where the string is intended to end. Otherwise, it will be split into two fields when trying to load the data.
12345,2015-03-03,"This string has a comma, but it isn't enclosed",56789,more data
- Enclosing the data field indicates that the enclosed delimiter is not the end of the field.
Enclosed quote character is not escaped
12345,2015-03-03,'This time the string is enclosed, but the apostrophe isn't escaped, which could be nasty.',O'Reilly
- This example has a problem, because the enclosure character (apostrophe) is used within the string (i.e., isn't) and the apostrophe is not escaped like it needs to be.
12345,2015-03-03,'This time the string is enclosed, but the apostrophe isn''t escaped, which could be nasty.',O'Reilly
- Because the delimiter is an apostrophe and the string contains an apostrophe, the delimiter character within the string will need to be escaped. The string is now enclosed, and the apostrophe within the string has been escaped, (e.g., ‘This time the string is enclosed, but the apostrophe isn'’t escaped, which could be nasty’). Because the string O’Reilly is not enclosed with apostrophes, the apostrophe does not need to be escaped.
Exporting to CSV
Common applications have the capability of saving data in a CSV format with UTF-8 encoding.
The following are steps used to save an Excel worksheet as a Unicode CSV file:
- Select the worksheet to be exported.
- In the File menu, click Save As.
- When the file browser opens, select Unicode Text as the file type.
- Enter the name of the table as the file name.
- Save the file.
- In the box that pops up, click OK.
- In the next box, click Yes.
The following steps are used to export an Access table as a Unicode CSV file:
- Select the table to be exported.
- Click the External Data tab.
- In the Exports section, click Text File.
- In the export wizard, enter the file name.
- Ensure that the Export data with formatting and layout checkbox is unchecked.
- Click OK.
- Select Delimited.
- Click Next.
- Choose a delimiter (any should work; just be consistent).
- Make sure that the Include Field Names on First Row checkbox is checked.
- Click Finish.