Basic information on financial databases: cook books, tips and tricks & economic news

This blog contains schematic easy to grasp - hands on - help in performing searches in economic databases, making work sets and making them inter-exchangeable between the databases.

* Disclaimer. I am not a finance professional. Most posts are the result of personal findings.

Note:
All presented images are scaled and can be enlarged to original size (click the picture).

Search This Blog

Showing posts with label concatenate. Show all posts
Showing posts with label concatenate. Show all posts

1/03/2018

CUSIP 6, CUSIP 8, CUSIP 9 and Repairing CUSIP codes

While the world has embraced ISIN USA sticks with CUSIP

Ideally all financial datbases would uniformly offer all company identifiers

CUSIP exists in three varieties

CUSIP 6 = Company (issuer)
CUSIP 8 = CUSIP 6 + issue (a CUSIP 8 always refers to one CUSIP 6, but CUSIP 6 can have more than one CUSIP 8)
CUSIP 9 = CUSIP 8 + check digit

Not all databases use the same CUSIP.

In WHARTON


Summarizing:
Compustat NA uses CUSIP 9
CRSP uses CUSIP 8
CRSP/Compustat merged uses both
CRSP deals only with stock markets NYSE, AMEX and NASDAQ

Take note of the other company IDENTIFIERS!
As for using Tickers instead of CUSIP.
Tickers are a good identifier in environments that are up to date and don't cover historic tickers.
NASDAQ companies got 4 character ticker codes

CIK: Company Index Key  (10 digits)
 Good instrument but its uses are limited (CRSP and Audit Analytics)
GVKey is a Wharton generated code that only works within the Wharton environment


REMOVING OR ADDING THE 9TH DIGIT?

As for removing the ninth digit from CUSIP:
Open your Tab delimited TXT file (using the wizard) in Excel
Create a new column ; select first cell in that column 
Click formula button in the command rule ƒx
Type in LEFT (enter) and select from options
A pop up appears.
Click in the first cell of the CUSIP column
Next rule: type in number 8 saying you only want to keep the first 8 digits
Drag down result of double click the + in right lower corner of the cell

Note: I am certain there are more catches lurking to meddle your results.
I guess it is recommended to download as many identifiers and information on the issues as you can find.

Don't forget: the number of available stock exchanges in CRSP is limited (NASDAQ, NYSE, AMEX). Compustat might cover more. Download stock exchange code as well.

Second: if you want to download from CRSP and venture to data from Fundamentals Annual, keep in  mind that both databases cover different entities.

TIP: it is easier to start off with CRSP, than haviung to determine which stocks are from the CRSP stock exchanges and which aren't. Use the CRSP/Compustat to make the conversion

Adding the 9th digit to a CUSIP 8
Since the ninth digit is a random figure it is hard to find out which makes a proper CUSIP 9
Extra help is offered through the VU libguide

"Many databases that have information on North America include CUSIP codes, but there are different versions. Some databases include only include issuer level CUSIP-6 codes. Other databases include the CUSIP-8 digit or full CUSIP(CUSIP-9). If you need to use the full cusip code for research and only have a list of Cusip-8 codes, you can use the CUSIP tool to create a text file with the calculated CUSIP-9 codes. The tool was created by Paul Barendregt using Python."
Completing company identifiers (CUSIP)
What if you got company identifiers, and they lost the zeros preceeding the code? (e.g. with opening an excel file containing CUSIP codes)
Example.
Removed zeros.



Checking how many numbers you need for CUSIP is to count the longest CUSIP codes. Or check graph above. In this case it is 9.
Since CUSIP consist of
 company    issue    check digit
123456         78            9
we can conclude that the most right digit in the codes are the check digits.
In other words, we need to add zeros.
In order to retrieve 9 digit CUSIP codes we can add the missing zeros with formula

CUSIP9
=REPT("0";9-LEN(cell nr))&cell nr
no space after ;
GVKEY:
=REPT("0";6-LEN(cell nr))&cell nr
LEN stands for length
no space after ;

IMPORTANT SETTINGS for the formula
Control Panel, language/ region settings: Check under the TAB language > additional settings
List separator ;
or else the formula won't work
Additional additional settings (for Europeans)
Decimal symbol ,
Digit group symbol (thousands and such) .

Close with Apply and OK

The following image shows codes without and with added zeros, plus the formula















RUNNING INTO PROBLEMS?

Recall I mentioned the output conflicting with Excel?
That happens to some people, depending on the computer they are using.
I get it too ...

To be precise: you'll see a little green triangle (1) in the top left of your sells > there is a hidden apostrophe (2) ... and that needs to be removed . . .
(1)                                                                             (2)









How to solve this:

1. make a selection of all codes
2. Copy to Notepad
3. Select all and copy with CTRL-C

4. Paste as number !!
5. It should be OK now.
6. This also (partially) happen to your CUSIP codes.
7. Repeat the same action
8. Note: I ran into another problem > the preceding zeros were removed in all cases where the green triangle happened
9. Use the REPT formula to repair this


ANOTHER WORK AROUND (EASIEST)
Maybe the easiest solution is the following:
Left in the cell there is a hidden apostrophe.
Removing this and thus enabling formulas.


Method:
1. select the column you want to repair
2. go to format cells (right click)
3. select general
next steps:
1. insert a new column > select the first cell
2. Press function button ƒx
3. search for function RIGHT (Rechts)
4. Start up and select first cell of the column you want to repair
5. Depending of the number of characters you want to keep: tell the actual number. So, CUSIP6 number is 6; CUSIP 8 number is 8 etc.


ADDING to make a CUSIP 6 a CUSIP 8 or 9 

It works but entirely depends on the type of CUSIP



Press TAB RESEARCH
The options are somewhat similar to the old interface (which remains posted below the update)




Remove first two characters in a spreadsheet. E.g.: Excel formula ( ƒx) Right number of characters 8, result is CUSIP 8. Copy this into a txt file (note pad) and save.
Next follow menu on WRDS page > choice is convert CUSIP 8 into 9

 



 9
Upload file.
Export as Tab delimited TXT file.











In short steps:

1. go to Wharton TAB Research (top bar)
2. Select CUSIP Converter (right most column)
3. Upload CUSIP 9
4. Determine desired output (CUSIP 9)
5. Determine desired output format (XLS, of TXT)
6. Execute
7. Right click save as...
8. Open in Excel
9. Excel file: repair with REPT("0";9-LEN(CELL NR))&CELL NR
10. type U in first cell next to column
11. Double click + in bottom corner 
12. First  cell in column next
13. fx formula: concatenate: set prefs
14. double click
15: check: go to Datastream and upload and use the cusip

3/18/2014

Excel tips for data

Here are some common Excel functions you might run into when using datasets from our databases.


Autofilter
Often overlooked, but very useful
Go to the Data (gegevens) TAB en press this icon: 






Select the entire column in one Go.
Type in CTRL-Shift-arrow down ↓ simultaneously

Straight up to the first cell
(CTRL-Home)
Straight down top the last cell
(CTRL-end)


Solving the ##### when columns are too narrow.
(applies to all columns in one single go)
Select the entire work sheet with CTRL-A or by clicking the arrow symbol top left
Double click column border to the right of the specific column


















Calculating difference between two dates (handy in event studies> time window)
Take two ultimate dates
Dutch  (language setting excel)
=datumverschil(A1;A2;"y") Years

=datumverschil(A1;A2;"m") months
=datumverschil(A1;A2;"d") days
English (language setting excel)
=datedif(A1;A2;"y") years
=datedif(A1;A2;"m") months
=datedif(A1;A2;"d") days

Subtracting or adding days to a date:
=(A1-30) or =(A1+30)

Changing data into text :
=text(A1;"dd-mm-jj" )
Dutch Excel
=tekst(A1;"dd-mm-jj" )

Adding or retrieving add-ins for Excel:
Step by step. Click the Excel symbol




Select Excel options (bottom of the pop-up menu)
Select invoegtoepassingen (add-ins)
When  you can't find the needed ad-in select add-ins en pres start (go)





A list pops up:


















If the add-in isn't there, choose the Browse button (bladeren)

An explorer windows opens where the desired add-in should be present.
If not, things get more complicated.

Combining cells


Concatenate cells (in Dutch samenvoegen)
=CONCATENATE(A1," ", B1)






Dutch Excel
TEKST.SAMENVOEGEN
=TEKST.SAMENVOEGEN(A1,"",B1)





Same effect, other formula:

Given a table with two data columns
=A1&B1
e.g.





The formula is visible behind the ƒx symbol and has to be filled in in Cell C1

With a space between the data:
=A1&" "&B1&"





NOTE: WHEN THIS DOESN'T WORK BECAUSE EXCEL CONFLICTS, TRY DOWNLOADING YOUR FILE IN EXCEL AND REPAIR THE CODES BY ADDING LOST ZEROS, IF THAT HAPPENS!!!!  
See below

Completing company identifiers (CUSIP)
What if you got company dentifiers, and they lost the zeros preceeding the code?
That happens in some computers (guess it's a settings thing, since it doesn't always happen) when saving as Excel, and open in Excel... Excel considers those zeros worthless.


Example.
GVKey codes are used in the Wharton (WRDS) databases to identify companies. Thoese number consist of 6 numbers. Often the preceding 00 , to make of the six numbers of companies with smaller GVkeys, are removed.
Same happens with e.g. CUSIPs.










Checking how many numbers you need for CUSIP is to count the longest CUSIP codes. In this case 9.
Since CUSIP consist of
 company    issue    check digit
123456         78            9
we can conclude that the most right digit in the codes are the check digits.
In other words, we need to add zeros.
In order to get back 9 digit CUSIP codes we can add the missing zeros with formula

=REPT("0";9-LEN(cell nr))&cell nr

GVKEY:
=REPT("0";6-LEN(cell nr))&cell nr
LEN stands for length

The following image shows codes without and with added zeros, plus the formula













IMPORTANT SETTINGS for the formula
Go to control panel > set language to English (USA) and region United States
Under TAB language > additional settings
List separator ;
or else the formula won't work
Additional additional settings (for Europeans)
Decimal symbol ,
Digit group symbol (thousands and such) .

Close with Apply and OK


RUNNING INTO PROBLEMS?

That happens to some people, depending on the computer they are using.
I get it . . .
If you get this in view, you'll get a hard time to use concatenate.
To be precise: you see a little green triangle in the top left of your sells > there is a hidden apostrophe... and it needs to be removed . . .


How to solve this:

1. make a selection of all codes
2. Copy to Notepad
3. Select all
4. Paste as number
5. It should be OK now.
6. This also (partially) happen to your CUSIP codes.
7. Repeat the same action
8. Note: I ran into another problem > the preceding zeros were removed in all cases where the green triangle happened
9. Use the REPT formula to repair this


Removing or adding the ninth digit?

As for removing the ninth digit from CUSIP:
Open your Tab delimited TXT file (using the wizard) in Excel
Create a new column ; select first cell in that column 
Click formula button in the command rule ƒx
Type in LEFT (enter) and select from options
A pop up appears.
Click in the first cell of the CUSIP column
Next rule: type in number 8 saying you only want to keep the first 8 digits
Drag down result of double click the + in right lower corner of the cell

Note: I am certain there are more catches lurking to meddle your results.
I guess it is recommended to download as many identifiers and information on the issues as you can find.

Don't forget: the number of available stock exchanges in CRSP is limited (NASDAQ, NYSE, AMEX). Compustat might cover more. Download stock exchange code as well.

Second: when you want to download from CRSP and venture to data from Fundamentals Annual, keep in  mind that both databases cover different entities.

Difference between two values (i.e. simplified return)
=(B2-B1)

Return (in finance) link
=(B2-B1)/B1












Converting dates.
When you got a row of dates that aren't recognised as dates.
For instance Compustat output format dates.

Make values from dates.













Best is to choose 07/25/1984 as it converts easily with the VALUE function (WAARDE).
Working with Dutch Excel (images)















The used table is an out file from Compustat with the date notation 25/07/1984
Wizard opening Text files in Excel: here

Once the value (waarde) function is selected, the next screen appears:











Select the first cell that needs to be converted.
The result looks like this: drag the cell downwards in order to apply all cells















The values have to be converted into text with formula
Convert values into text
=text(A1;"dd-mm-jj" )
Dutch Excel
=tekst(A1;"dd-mm-jj" )





Once the dates have been converted they can be used in other applications without problem.