Unix: Get file name without extension from file path
I recently found myself needing to extract the file name but not file extension from a bunch of file paths and wanted to share a neat technique that I learnt to do it.
I started with a bunch of Jupyter notebook files, which I listed usign the following command;
$ find notebooks/ -maxdepth 1 -iname *ipynb
notebooks/09_Predictions_sagemaker.ipynb
notebooks/00_Environment.ipynb
notebooks/05_Train_Evaluate_Model.ipynb
notebooks/01_DataLoading.ipynb
notebooks/05_SageMaker.ipynb
notebooks/09_Predictions_sagemaker-Copy2.ipynb
notebooks/09_Predictions_sagemaker-Copy1.ipynb
notebooks/02_Co-Author_Graph.ipynb
notebooks/04_Model_Feature_Engineering.ipynb
notebooks/09_Predictions_scikit.ipynb
notebooks/03_Train_Test_Split.ipynb
If we pick one of those files:
file="notebooks/05_Train_Evaluate_Model.ipynb"
I want to extract the file name from this file path, which would give us 05_Train_Evaluate_Model
.
We can extract the file name using the basename
function:
$ basename ${file}
05_Train_Evaluate_Model.ipynb
StackOverflow has many suggestions for stripping out the file extension, but my favourite is one that uses parameter expansion.
${parameter#word}
${parameter##word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches the beginning of the expanded value of parameter, then the result of the expansion is the expanded value of parameter with the shortest matching pattern (the "#" case) or the longest matching pattern (the "##" case) deleted. If parameter is '@' or '', the pattern removal operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with ‘@’ or ‘’, the pattern removal operation is applied to each member of the array in turn, and the expansion is the resultant list.
We can use it like this:
$ basename ${file%.*}
05_Train_Evaluate_Model
Because we’ve used the %
variant, this will delete the shortest matching pattern.
i.e. only one file extension
If we had a file that ends with multiple file extensions, we’d need to use the %%
variant instead:
$ filename="notebooks/05_Train_Evaluate_Model.ipynb.bak"
$ echo ${filename%%.*}
notebooks/05_Train_Evaluate_Model
Going back to our original problem, we can extract the file names for all of our Jupyter notebooks by running the following:
for file in `find notebooks -maxdepth 1 -iname *.ipynb`; do
echo $(basename ${file%.*})
done
09_Predictions_sagemaker
00_Environment
05_Train_Evaluate_Model
01_DataLoading
05_SageMaker
09_Predictions_sagemaker-Copy2
09_Predictions_sagemaker-Copy1
02_Co-Author_Graph
04_Model_Feature_Engineering
09_Predictions_scikit
03_Train_Test_Split
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.